Suppose we have the following data:
set.seed(123) dat <- data.frame(var1=c(10,35,13,19,15,20,19), id=c(1,1,2,2,2,3,4)) (sampledIDs <- sample(min(dat$id):max(dat$id), size=3, replace=TRUE)) > [1] 2 4 2
The sampled identifiers are the identifier vector, which is selected (with replacement) from dat$id . I need code that leads (and works also for a large data set with a lot of variables):
var1 id 13 2 19 2 15 2 19 4 13 2 19 2 15 2
The code dat[which(dat$id%in%sampledIDs),] does not give me what I want, since the result of this code
var1 id 13 2 19 2 15 2 19 4
when the subject with dat$id==2 appears only once in this data (I understand why this is the result, but I donβt know how to get what I want). Can anybody help?
EDIT : Thanks for the answers, here is the lead time for all the answers (for those who are interested):
test replications elapsed relative user.self 3 dat[unlist(lapply(sampledIDs, function(x) which(x == dat$id))), ] 1000 0.67 1.000 0.64 1 dat[which(sapply(sampledIDs, "==", dat$id), arr.ind = TRUE)[, 1], ] 1000 0.67 1.000 0.67 2 do.call(rbind, split(dat, dat$id)[as.character(sampledIDs)]) 1000 1.83 2.731 1.83 4 setkey(setDT(dat), id)[J(sampledIDs)] 1000 1.33 1.985 1.33