I have a dataframe consisting of an ID , the same for each element in the group, two datetimes and the time interval between the two. One of the datetime objects is my corresponding time marker. Now I like to get a subset of the data frame consisting of the earliest record for each group. Records (especially time intervals) should remain intact.
My first approach was to sort the frame according to 1. ID and 2. relevant datetime. However, I could not return the first record for each new group.
Then I looked at the aggregate() function, as well as the ddply() function, but I could not find an option in both cases that simply returns the first record without applying the aggregate function to the value of the time interval.
Is there a (simple) way to accomplish this?
Addition: Perhaps I was unclear by adding aggregate () and ddply () notes. I donβt have to get ready. Given the fact that the dataframe is sorted so that the first line of each new group is the line I am looking for, it would be enough to simply return a subset with each line that has a different identifier than the previous one (which is the start line of each new group).
Sample data:
structure(list(ID = c(1454L, 1322L, 1454L, 1454L, 1855L, 1669L, 1727L, 1727L, 1488L), Line = structure(c(2L, 1L, 3L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("A", "B", "C"), class = "factor"), Start = structure(c(1357038060, 1357221074, 1357369644, 1357834170, 1357913412, 1358151763, 1358691675, 1358789411, 1359538400 ), class = c("POSIXct", "POSIXt"), tzone = ""), End = structure(c(1357110430, 1357365312, 1357564413, 1358230679, 1357978810, 1358674600, 1358853933, 1359531923, 1359568151), class = c("POSIXct", "POSIXt"), tzone = ""), Interval = c(1206.16666666667, 2403.96666666667, 3246.15, 6608.48333333333, 1089.96666666667, 8713.95, 2704.3, 12375.2, 495.85)), .Names = c("ID", "Line", "Start", "End", "Interval"), row.names = c(NA, -9L), class = "data.frame")