R regex to find two words: one line, order and distance may vary

Question

R regex to find two words: one line, order and distance may vary

I want to create one regular expression (if possible) to search in strings and determine if two words occur on the same string. I know that I can use two grepl (as shown below), but I want to use one regex to test this condition. The more effective the regular expression, the better.

I want to find strings that contain both “human” and “dog” case insensitive.

 x <- c( "The dog and the man play in the park.", "The man plays with the dog.", "That is the man hat.", "Man I love that dog!", "I'm dog tired" ) ## this works but I want a single regex grepl("dog", x, ignore.case=TRUE) & grepl("man", x, ignore.case=TRUE)

+4

regex r

Tyler rinker Sep 23 '15 at 13:31

source share

2 answers

You can use a Perl-like regular expression with two options:

 grepl("^(?=.*\\bman\\b)(?=.*\\bdog\\b)", x, ignore.case=TRUE, perl=TRUE)

Watch the IDEONE demo

Results for input above: [1] TRUE TRUE FALSE TRUE FALSE

The images ^(?=.*\\bman\\b)(?=.*\\bdog\\b) check only the whole words man and dog at the input and pass only if both are present regardless of their order ( dog can be before man , or vice versa).

Due to the ^ start-of-string binding, these checks are performed only once per input, thus maintaining good performance.

+2

Wiktor stribiżew Sep 23 '15 at 13:36

source share

Avinash raj · Accepted Answer · 2015-09-23T13:35:54+0000

Use the regex operator | .

 grepl(".*(dog.*man|man.*dog).*", x, ignore.case=TRUE)

Use word boundaries if necessary.

 grepl(".*(\\bdog\\b.*\\bman\\b|\\bman\\b.*\\bdog\\b).*", x, ignore.case=TRUE)

There is no need to move back and forth .*

 grepl("(dog.*man|man.*dog)", x, ignore.case=TRUE)

You can specify a case insensitive modifier inside the regular expression itself.

 grepl("(?i)(dog.*man|man.*dog)", x)

R regex to find two words: one line, order and distance may vary

More articles: