Is there a better way to achieve this? I would like to remove all lines from this vector that are substrings of other elements.
words = c("please can you", "please can", "can you", "how did you", "did you", "have you") > words [1] "please can you" "please can" "can you" "how did you" "did you" "have you" library(data.table) library(stringr) dt = setDT(expand.grid(word1 = words, word2 = words, stringsAsFactors = FALSE)) dt[, found := str_detect(word1, word2)] setdiff(words, dt[found == TRUE & word1 != word2, word2]) [1] "please can you" "how did you" "have you"
It works, but it seems like it is superfluous, and I'm interested in learning a more elegant way to do it.
source share