Regex match if another substring does not match

I am trying to dig deeper into regular expressions and want to match the condition if any substring is also not found on the same line. I know that I can use two grepl operators (as shown below), but I want to use one regular expression to test this condition, as I push my understanding. Let's say I want to combine the words “dog” and “man” with "(dog.*man|man.*dog)" ( from here ), but not if the string contains the substring “park”. I decided that I could use (*SKIP)(*FAIL) to deny the "park", but this does not lead to a line failure (shown below).

  • How can I combine the search logic “dog” and “man”, but not “park” with 1 regular expression?
  • What is wrong with my understanding (*SKIP)(*FAIL)| ?

Code:

 x <- c( "The dog and the man play in the park.", "The man plays with the dog.", "That is the man hat.", "Man I love that dog!", "I'm dog tired", "The dog park is no place for man.", "Park next to this dog man." ) # Could do this but want one regex grepl("(dog.*man|man.*dog)", x, ignore.case=TRUE) & !grepl("park", x, ignore.case=TRUE) # Thought this would work, it does not grepl("park(*SKIP)(*FAIL)|(dog.*man|man.*dog)", x, ignore.case=TRUE, perl=TRUE) 
+5
source share
2 answers

You can use a bound perspective solution (requiring a Perl-style regular expression):

 grepl("^(?!.*park)(?=.*dog.*man|.*man.*dog)", x, ignore.case=TRUE, perl=T) 

Here is the IDEONE daemon

  • ^ - binds a pattern at the beginning of a line
  • (?!.*park) - does not match if park present
  • (?=.*dog.*man|.*man.*dog) - do not match if man and dog missing.

Another version (more scalable) with 3 options:

 ^(?!.*park)(?=.*dog)(?=.*man) 
+6
source

stribizhev already answered this question , because it needs to be approached: with a negative look.

I will contribute to this specific question:

What is wrong with my understanding (*SKIP)(*FAIL) ?

(*SKIP) and (*FAIL) are regular expressions of control verbs .

  • (*FAIL) or (*F)
    This is easiest to understand. (*FAIL) is exactly the same as a negative expression with an empty subpattern: (?!) . As soon as the regular expression engine gets into this verb in the template, it instantly goes back.
  • (*SKIP) When the regular expression engine first encounters this verb, nothing happens because it only works when it reaches a return. But if there is a later failure, and it reaches (*SKIP) from right to left, backtracking cannot pass (*SKIP) . It causes:

    • Compliance Error.
    • The next match will not be played with the next character. Instead, it will start from the position in the text where the engine was when it reached (*SKIP) .

    That's why these two control verbs are usually combined as (*SKIP)(*FAIL)

Consider the following example :

  • Pattern:. .*park(*SKIP)(*FAIL)|.*dog
  • Theme: "That park has too many dogs"
  • Matches: " has too many dog"

Inside:

  • First try.
  That park has too many dogs || .*park(*SKIP)(*FAIL)|.*dog /\ /\ (here) we have a match for park the engine passes (*SKIP) -no action it then encounters (*FAIL) -backtrack Now it reaches (*SKIP) from the right -FAIL! 
  1. Second attempt.
    Usually it should start with the second character of the object. However (*SKIP) has this particular behavior. The second attempt begins:
  That park has too many dogs || .*park(*SKIP)(*FAIL)|.*dog /\ /\ (here) Now, there no match for .*park And off course it matches .*dog That park has too many dogs || .*park(*SKIP)(*FAIL)|.*dog ^ ^ ----- | (MATCH!) | +---------------+ 

Demo


How can I combine the search logic “dog” and “man”, but not “park” with 1 regular expression?

Use stribizhev solution !! Try to avoid using control verbs to ensure compatibility; they are not implemented in all flavors of regular expressions. But if you are interested in these oddities of regular expressions, there is another stronger verb: (*COMMIT) . It is similar to (*SKIP) , acting only with reverse tracking, except that it causes failures of the entire match (there will be no other attempt at all). For example :

 +-----------------------------------------------+ |Pattern: | |^.*park(*COMMIT)(*FAIL)|dog | +-------------------------------------+---------+ |Subject | Matches | +-----------------------------------------------+ |The dog and the man play in the park.| FALSE | |Man I love that dog! | TRUE | |I'm dog tired | TRUE | |The dog park is no place for man. | FALSE | |park next to this dog man. | FALSE | +-------------------------------------+---------+ 

demo IDEONE

+3
source

Source: https://habr.com/ru/post/1232138/


All Articles