Remove vector elements that are substrings of another

Is there a better way to achieve this? I would like to remove all lines from this vector that are substrings of other elements.

words = c("please can you", "please can", "can you", "how did you", "did you", "have you") > words [1] "please can you" "please can" "can you" "how did you" "did you" "have you" library(data.table) library(stringr) dt = setDT(expand.grid(word1 = words, word2 = words, stringsAsFactors = FALSE)) dt[, found := str_detect(word1, word2)] setdiff(words, dt[found == TRUE & word1 != word2, word2]) [1] "please can you" "how did you" "have you" 

It works, but it seems like it is superfluous, and I'm interested in learning a more elegant way to do it.

+5
source share
1 answer

Find each component of words in words , keeping those that happen once:

 words[colSums(sapply(words, grepl, words, fixed = TRUE)) == 1] 

giving:

 [1] "please can you" "how did you" "have you" 
+6
source

Source: https://habr.com/ru/post/1233979/


All Articles