I'm relatively new to r (coming from sas) I need to choose a different number of observations in each group. Groups are identified by the values of two variables
ToSelect <- data.frame(
key1=c(1,1,1,1,1,2,2,2,2,2,2,2),
key2=c("a","a","b","b","b","a","a","a","a","b","b","b"),
var1=c(2,3,4,6,2,7,8,5,7,1,8,5)
)
NumObs <- data.frame(
key1=c(1,1,2,2),
key2=c("a","b","a","b"),
NumObs=c(1,2,2,1)
)
I tried (from the question "Choose the first 80 observations for each level in R")
ToSelect <- merge(x=ToSelect,y=NumObs,by=c("key1","key2"))
library(plyr)
Selected <- ddply(ToSelect, .(key1,key2), head, n = NumObs)
which gives
Error: length (n) == 1L not TRUE
which is probably an obvious mistake for experts (n scalar, NumObs vector?)
From the same question, I tried:
Selected <- do.call(
rbind,
lapply(split(ToSelect, c(ToSelect$key1,ToSelect$key2)), head, NumObs)
)
which gives
Error: length (n) == 1L is not TRUE. Additionally: Warning message: In split.default (x = seq_len (nrow (x)), f = f, drop = drop, ...): data length is not a multiple of the split variable
So, the same error as before, plus a few things, I can not use split if the groups have different lengths?
" ", rle/sequence , ddply:
ToSelect <- ddply(ToSelect, .(key1, key2), function(z){
cbind(var1=z$var1,NumObs=z$NumObs,
data.frame(
SeqNum = seq_along(z$key2)
)
)
}
)
Selected <- ToSelect[ToSelect$SeqNum<=ToSelect$NumObs,c("key1","key2","var1")]
.
, , ?
!