The selection of the first nth row by group with the number of rows varied

I like to select the first (2,3,0,4) rows of each group in the data frame.

> f<-data.frame(group=c(1,1,1,2,2,3,4),y=c(1:7)) > > group y > 1 1 > 1 2 > 1 3 > 2 4 > 2 5 > 3 6 > 4 7 

and get the data frame as follows

 group y 1 1 1 2 2 4 2 5 4 7 

I tried using by and head , but head does not accept a vector.

Thank you for your help.

+4
source share
3 answers

Version of the function with indexes.

 fun1 <- function(){ idx <- c(0,which(diff(f$group)!=0))+1 idx2 <- unlist(lapply(1:length(nf),function(x) seq.int(from=idx[x],length.out=nf[x])),use.names=F) f1 <- f[idx2,] return(f1) } fun2 <- function(){ ddply(f,.(group),function(x) head(x,nf[x[1,1]])) } 

Test data (size suggested by the question author)

 f<-data.frame(group=sample(1:1000,50000,T),y=c(1:50000)) f <- f[order(f$group),] nf <- rpois(length(unique(f$group)),3) 

system.time (fun1 ()) system.time (fun2 ())

On my system ~ 60 times faster fun1.

+1
source

With a more traditional lapply :

 k <- c(2,3,0,4) fs <- split(f, f$group) do.call(rbind,lapply(seq_along(k), function(i) head(fs[[i]], k[i]))) 

result:

  group y 1 1 1 2 1 2 4 2 4 5 2 5 7 4 7 
+5
source

Using plyr :

 library(plyr) rows <- c(2,3,0,4) ddply(f,.(group),function(x)head(x,rows[x[1,1]])) group y 1 1 1 2 1 2 3 2 4 4 2 5 5 4 7 

Edit

misunderstood the question, so updated answer

+2
source

Source: https://habr.com/ru/post/1342674/


All Articles