Delete entries from a string vector containing specific characters in R

I have two character vectors:

x = {"a", "b", "c", "kt"}
y = {"abs", "kot", "ccf", "okt", "kk", "y"}

I need to use x to delete entries from y so that only rows that do not contain any of the x entries are left, for example:

y = {"kot", "kk", "y"}

The code should work for any size of the vectors x and y.

So far, I have tried using gsub and grepl, but they only work on single lines. I tried to create a loop for this, but the problem seems more complicated than I thought. And, of course, the more complex the solution, the better, but you can assume that in this case the vectors x and y have up to 200 entries.

+9
source share
3 answers

grep , y x , !%in%

y[!y %in% grep(paste0(x, collapse = "|"), y, value = T)]

#[1] "kot" "kk"  "y"  

grepl

y[!grepl(paste0(x, collapse = "|"), y)]

grep invert value

grep(paste0(x, collapse = "|"), y, invert = TRUE, value = TRUE)
#[1] "kot" "kk"  "y"  
+12

, @Ronak, , - sapply grepl, y, x, apply.

> y[!apply(sapply(x, function(q) {grepl(q, y)}), 1, function(x) {sum(as.numeric(x)) > 0})]
[1] "kot" "kk"  "y"  

:

> sapply(x, function(q) { grepl(q, y) })
         a     b     c    kt
[1,]  TRUE  TRUE FALSE FALSE
[2,] FALSE FALSE FALSE FALSE
[3,] FALSE FALSE  TRUE FALSE
[4,] FALSE FALSE FALSE  TRUE
[5,] FALSE FALSE FALSE FALSE
[6,] FALSE FALSE FALSE FALSE
       ^^^^ each column is a match result for each element of x
+4

:

y[Reduce("+", lapply(x, grepl, y, fixed=TRUE))==0]
# [1] "kot" "kk"  "y"  
0

Source: https://habr.com/ru/post/1662391/


All Articles