Remove vector elements that are substrings of another

Question

Remove vector elements that are substrings of another

Is there a better way to achieve this? I would like to remove all lines from this vector that are substrings of other elements.

words = c("please can you", "please can", "can you", "how did you", "did you", "have you") > words [1] "please can you" "please can" "can you" "how did you" "did you" "have you" library(data.table) library(stringr) dt = setDT(expand.grid(word1 = words, word2 = words, stringsAsFactors = FALSE)) dt[, found := str_detect(word1, word2)] setdiff(words, dt[found == TRUE & word1 != word2, word2]) [1] "please can you" "how did you" "have you"

It works, but it seems like it is superfluous, and I'm interested in learning a more elegant way to do it.

+5

string r

Akhil nair Oct 18 '15 at 19:14

source share

1 answer

G. grothendieck · Accepted Answer · 2015-10-18T19:23:33+0000

Find each component of words in words , keeping those that happen once:

 words[colSums(sapply(words, grepl, words, fixed = TRUE)) == 1]

giving:

 [1] "please can you" "how did you" "have you"

Remove vector elements that are substrings of another

More articles: