Return the first row of a group

Question

Return the first row of a group

I have a dataframe consisting of an ID , the same for each element in the group, two datetimes and the time interval between the two. One of the datetime objects is my corresponding time marker. Now I like to get a subset of the data frame consisting of the earliest record for each group. Records (especially time intervals) should remain intact.

My first approach was to sort the frame according to 1. ID and 2. relevant datetime. However, I could not return the first record for each new group.

Then I looked at the aggregate() function, as well as the ddply() function, but I could not find an option in both cases that simply returns the first record without applying the aggregate function to the value of the time interval.

Is there a (simple) way to accomplish this?

Addition: Perhaps I was unclear by adding aggregate () and ddply () notes. I don’t have to get ready. Given the fact that the dataframe is sorted so that the first line of each new group is the line I am looking for, it would be enough to simply return a subset with each line that has a different identifier than the previous one (which is the start line of each new group).

Sample data:

 structure(list(ID = c(1454L, 1322L, 1454L, 1454L, 1855L, 1669L, 1727L, 1727L, 1488L), Line = structure(c(2L, 1L, 3L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("A", "B", "C"), class = "factor"), Start = structure(c(1357038060, 1357221074, 1357369644, 1357834170, 1357913412, 1358151763, 1358691675, 1358789411, 1359538400 ), class = c("POSIXct", "POSIXt"), tzone = ""), End = structure(c(1357110430, 1357365312, 1357564413, 1358230679, 1357978810, 1358674600, 1358853933, 1359531923, 1359568151), class = c("POSIXct", "POSIXt"), tzone = ""), Interval = c(1206.16666666667, 2403.96666666667, 3246.15, 6608.48333333333, 1089.96666666667, 8713.95, 2704.3, 12375.2, 495.85)), .Names = c("ID", "Line", "Start", "End", "Interval"), row.names = c(NA, -9L), class = "data.frame")

+6

r aggregate plyr

fr3d-5 Oct 18 '13 at 13:33

source share

2 answers

Since you are not providing any data, here is an example using the R base with a sample data:

 df <- data.frame(group=c("a", "b"), value=1:8) ## Order the data frame with the variable of interest df <- df[order(df$value),] ## Aggregate aggregate(df, list(df$group), FUN=head, 1)

EDIT: As Ananda suggests in his comment, the following aggregate call is better:

 aggregate(.~group, df, FUN=head, 1)

If you prefer to use plyr , you can replace aggregate with ddply :

 ddply(df, "group", head, 1)

+8

juba Oct 18 '13 at 13:39

source share

fr3d-5 · Accepted Answer · 2013-10-21T16:17:35+0000

By reproducing the data frame of the example and testing it, I found a way to get the desired result:

Order data for the corresponding columns (ID, Start)
ordered_data <- data[order(data$ID, data$Start),]
Find the first row for each new identifier
final <- ordered_data[!duplicated(ordered_data$ID),]

Return the first row of a group

More articles: