Regular expression for a third-person verb

I am trying to create a regular expression that matches the third-person form of a verb created using the following rule:

If the verb ends with e, which is not preceded by i, o, s, x, z, ch, sh, add s.

So, I am looking for a regular expression matching a word consisting of several letters, then not i, o, s, x, z, ch, sh, and then "es". I tried this:

\b\w*[^iosxz(sh)(ch)]es\b 

According to regex101, it corresponds to “loves,” “hates,” etc. However, it does not match the “baths,” why is it not?

+6
source share
2 answers

you can use

 \b(?=\w*(?<![iosxz])(?<![cs]h)es\b)\w* 

Watch the regex demo

Since Python re does not support variable-length alternatives in lookbehind, you need to break down the conditions into two lookbehinds here.

Template Details :

  • \b - upper word boundary
  • (?=\w*(?<![iosxz])(?<![cs]h)es\b) is a positive result that requires a sequence:
    • \w* - characters + + +
    • (?<![iosxz]) - there should not be i , o , s , x , z characters right in front of the current location and ...
    • (?<![cs]h) - no ch or sh right in front of the current location ...
    • es - es follows ...
    • \b - at the end of a word
  • \w* - zero or more (maybe + better here to match 1 or more) word characters.

See Python Demo :

 import re r = re.compile(r'\b(?=\w*(?<![iosxz])(?<![cs]h)es\b)\w*') s = 'it matches "likes", "hates" etc. However, it does not match "bathes", why doesn\'t it?' print(re.findall(r, s)) 
+2
source

If you want to match strings ending in e and not preceded by i , o , s , x , z , ch , sh , you should

 (?<!i|o|s|x|z|ch|sh)e 

Your regular expression [^iosxz(sh)(ch)] consists of a group of characters , ^ just negates, and the rest will exactly match, therefore it is equivalent to:

 [^io)sxz(c] 

which actually means: "match everything that is not one of the" io "sxz (c".

+1
source

Source: https://habr.com/ru/post/1012278/


All Articles