The selection of the first nth row by group with the number of rows varied

Question

The selection of the first nth row by group with the number of rows varied

I like to select the first (2,3,0,4) rows of each group in the data frame.

> f<-data.frame(group=c(1,1,1,2,2,3,4),y=c(1:7)) > > group y > 1 1 > 1 2 > 1 3 > 2 4 > 2 5 > 3 6 > 4 7

and get the data frame as follows

 group y 1 1 1 2 2 4 2 5 4 7

I tried using by and head , but head does not accept a vector.

Thank you for your help.

+4

r dataframe

Tony Mar 07 '11 at 17:45

source share

3 answers

With a more traditional lapply :

 k <- c(2,3,0,4) fs <- split(f, f$group) do.call(rbind,lapply(seq_along(k), function(i) head(fs[[i]], k[i])))

result:

  group y 1 1 1 2 1 2 4 2 4 5 2 5 7 4 7

+5

Aaron Mar 07 '11 at 18:08

source share

Using plyr :

 library(plyr) rows <- c(2,3,0,4) ddply(f,.(group),function(x)head(x,rows[x[1,1]])) group y 1 1 1 2 1 2 3 2 4 4 2 5 5 4 7

Edit

misunderstood the question, so updated answer

+2

Sacha epskamp Mar 07 '11 at 17:52

source share

Wojciech sobala · Accepted Answer · 2011-03-07T21:31:24+0000

Version of the function with indexes.

 fun1 <- function(){ idx <- c(0,which(diff(f$group)!=0))+1 idx2 <- unlist(lapply(1:length(nf),function(x) seq.int(from=idx[x],length.out=nf[x])),use.names=F) f1 <- f[idx2,] return(f1) } fun2 <- function(){ ddply(f,.(group),function(x) head(x,nf[x[1,1]])) }

Test data (size suggested by the question author)

 f<-data.frame(group=sample(1:1000,50000,T),y=c(1:50000)) f <- f[order(f$group),] nf <- rpois(length(unique(f$group)),3)

system.time (fun1 ()) system.time (fun2 ())

On my system ~ 60 times faster fun1.

The selection of the first nth row by group with the number of rows varied

Edit

More articles: