Here's how I would do it:
keyword <- "moon" lookaround <- 2 pattern <- paste0("([[:alpha:]]+ ){0,", lookaround, "}", keyword, "( [[:alpha:]]+){0,", lookaround, "}") regmatches(str, regexpr(pattern, str))[[1]]
Idea: search for any character followed by a space with a minimum of 0 times and a maximum value of “lookaround” (here 2) times, followed by a “keyword” (here “moon”), followed by a space and a set of characters repeating between 0 and "reverse" times. The regexpr function gives the beginning and end of this pattern. regmatches , which wraps this function, then extracts a substring from these start / stop positions.
Note: regexpr can be replaced with gregexpr if you want to find more than one occurrence of the same pattern.
Here's a comparative analysis of big data comparing Hong with this answer:
str <- "The cow jumped over the moon with a silver plate in its mouth" ll <- rep(str, 1e5) hong <- function(str) { str <- strsplit(str, " ") sapply(str, function(y) { i <- which(y=="moon") paste(y[seq(max(1, (i-2)), min((i+2), length(y)))], collapse= " ") }) } arun <- function(str) { keyword <- "moon" lookaround <- 2 pattern <- paste0("([[:alpha:]]+ ){0,", lookaround, "}", keyword, "( [[:alpha:]]+){0,", lookaround, "}") regmatches(str, regexpr(pattern, str)) } require(microbenchmark) microbenchmark(t1 <- hong(ll), t2 <- arun(ll), times=10)