Exclude elements from a vector based on a regular expression pattern

I have some data that I want to clear using a regular expression in R.

Itโ€™s easy to find how to get elements containing certain patterns or not to contain certain words (strings), but I canโ€™t find out how to do this to exclude cells containing a pattern.

How can I use a generic function only to save these elements from a vector that does not contain PATTERN?

I prefer not to give an example, as this can make people respond to other (though usually pleasant) ways than intended: excluding based on a regular expression. It doesn't matter here:

How to exclude all elements containing any of the following characters: 'pyfgcrl

 vector <- c("Cecilia", "Cecily", "Cecily's", "Cedric", "Cedric's", "Celebes", "Celebes's", "Celeste", "Celeste's", "Celia", "Celia's", "Celina") 

In this case, the result is an empty vector.

+4
source share
1 answer

Edit: from the comments and with a little testing, one could find that my suggestion was incorrect.

Here are two correct solutions:

 vector[!grepl("['pyfgcrl]", vector)] ## kohske grep("['pyfgcrl]", vector, value = TRUE, invert = TRUE) ## flodel 

If one of them wants to resend the message and accept credit for his reply, I am more than happy to delete it here.


Explanation

The general function you are looking for is grepl . From the help file for grepl :

grepl returns a boolean vector (matches or not for each x element).

In addition, you should read the regex help page that describes character classes. In this case, you create a character class ['pyfgcrl] that says to search for any character in square brackets. Then you can undo it with ! .

So, up to this point, we have something similar:

 !grepl("['pyfgcrl]", vector) 

To get what you are looking for, you multiply, as usual.

 vector[!grepl("['pyfgcrl]", vector)] 

For the second solution suggested by @flodel, grep by default returns the position at which the match is performed, and the argument value = TRUE allows you to return the actual string value instead. invert = TRUE means returning values โ€‹โ€‹that were not matched.

+6
source

Source: https://habr.com/ru/post/1490120/


All Articles