R regex to find two words: one line, order and distance may vary

I want to create one regular expression (if possible) to search in strings and determine if two words occur on the same string. I know that I can use two grepl (as shown below), but I want to use one regex to test this condition. The more effective the regular expression, the better.

I want to find strings that contain both โ€œhumanโ€ and โ€œdogโ€ case insensitive.

 x <- c( "The dog and the man play in the park.", "The man plays with the dog.", "That is the man hat.", "Man I love that dog!", "I'm dog tired" ) ## this works but I want a single regex grepl("dog", x, ignore.case=TRUE) & grepl("man", x, ignore.case=TRUE) 
+4
source share
2 answers

Use the regex operator | .

 grepl(".*(dog.*man|man.*dog).*", x, ignore.case=TRUE) 

Use word boundaries if necessary.

 grepl(".*(\\bdog\\b.*\\bman\\b|\\bman\\b.*\\bdog\\b).*", x, ignore.case=TRUE) 

There is no need to move back and forth .*

 grepl("(dog.*man|man.*dog)", x, ignore.case=TRUE) 

You can specify a case insensitive modifier inside the regular expression itself.

 grepl("(?i)(dog.*man|man.*dog)", x) 
+9
source

You can use a Perl-like regular expression with two options:

 grepl("^(?=.*\\bman\\b)(?=.*\\bdog\\b)", x, ignore.case=TRUE, perl=TRUE) 

Watch the IDEONE demo

Results for input above: [1] TRUE TRUE FALSE TRUE FALSE

The images ^(?=.*\\bman\\b)(?=.*\\bdog\\b) check only the whole words man and dog at the input and pass only if both are present regardless of their order ( dog can be before man , or vice versa).

Due to the ^ start-of-string binding, these checks are performed only once per input, thus maintaining good performance.

+2
source

Source: https://habr.com/ru/post/1232139/


All Articles