We can use methods base Rwith regexprandReduce
Reduce(`+`, lapply(dict, function(x) lengths(regmatches(txt, gregexpr(x, txt)))))
Or a faster approach would be
Reduce(`+`, lapply(dict, function(x) vapply(gregexpr(x, txt),
function(y) sum(attr(y, "match.length")>0), 0)))
NOTE. When using large data sets and a large number of dictionary elements, this method will not have any restrictions.
data
txt <- c("I am an angry tiger.", "I am unhappy clam.", "I am an angry and unhappy tiger.",
"I am an angry, angry, tiger." ,"Beep boop.")
dict <- c("angry", "unhappy")
source
share