Here's a relatively quick way to use data.table :
require(data.table) vv <- vapply(y, length, 0L) DT <- data.table(y = unlist(y), id = rep(seq_along(y), vv), pos = sequence(vv)) setkey(DT, y) # OLD CODE which will not take care of no-match entries (commented) # DT[J(c("chocolate", "good")), list(list(pos)), by=id]$V1 setkey(DT[J(c("chocolate", "good"))], id)[J(seq_along(vv)), list(list(pos))]$V1
Idea:
First, we write off the list into a DT column named y . In addition, we create two other columns named id and pos . id indicates the index in the list, and pos indicates the position inside this id . Then, by creating a key column on id , we can execute a quick subset. Using this subset, we get the corresponding pos values ββfor each id . Before we collect all the pos for each id in the list, and then simply display the column of the list (V1), we will take care of those records where there was no match for our request by setting the key to id after the first subset and a subset of all possible id values ββ( as this will result in NA for non-existent entries.
Benchmarking with lapply code in your message:
x <- list(c('I', 'like', 'chocolate', 'cake'), c('chocolate', 'cake', 'is', 'good')) y <- rep(x, 5000) require(data.table) arun <- function() { vv <- vapply(y, length, 0L) DT <- data.table(y = unlist(y), id = rep(seq_along(y), vv), pos = sequence(vv)) setkey(DT, y) setkey(DT[J(c("chocolate", "good"))], id)[J(seq_along(vv)), list(list(pos))]$V1 } tyler <- function() { lapply(y, function(x) { which(x %in% c("chocolate", "good")) }) } require(microbenchmark) microbenchmark(a1 <- arun(), a2 <- tyler(), times=50) Unit: milliseconds expr min lq median uq max neval a1 <- arun() 30.71514 31.92836 33.19569 39.31539 88.56282 50 a2 <- tyler() 626.67841 669.71151 726.78236 785.86444 955.55803 50 > identical(a1, a2)