Matching regex with optional lookahead

I have the following lines:

NAME John Nash FROM California

NAME John Nash

I want a regular expression that can extract "John Nash" for both lines.

Here is what I tried

 "NAME(.*)(?:FROM)" "NAME(.*)(?:FROM)?" "NAME(.*?)(?:FROM)?" 

but none of them work for both lines.

+5
source share
4 answers

You can use a logical OR between FROM and anchor $ :

 NAME(.*)(?:FROM|$) 

See demo https://regex101.com/r/rR3gA0/1

In this case, after the name, it will match FROM or the end of the line. But in your regular expression, since you make FROM optional in the first case, it will match the rest of the line after the name.

If you want to use a more general regular expression, you better create your regular expression based on your name form forms, for example, if you are sure that your names are created from 2 words, you can use the following regular expression:

 NAME\s(\w+\s\w+) 

Demo https://regex101.com/r/kV2eB9/2

+4
source

Make the second part of the line optional (?: FROM.*?)? , i.e:

 NAME (.*?)(?: FROM.*?)?$ 

 MATCH 1 1. [5-14] `John Nash` MATCH 2 1. [37-46] `John Nash` MATCH 3 1. [53-66] `John Doe Nash` 

Regex Demo
https://regex101.com/r/bL7kI2/2

+2
source

You can do without regex:

 >>> myStr = "NAME John Nash FROM California" >>> myStr.split("FROM")[0].replace("NAME","").strip() 'John Nash' 
+1
source
  r'^\w+\s+(\w+\s+\w+) - word at start of string follows by one or more spaces and two words and at least one space between them with open('data', 'r') as f: for line in f: mo = re.search(r'^\w+\s+(\w+\s+\w+)',line) if mo: print(mo.group(1)) John Nash John Nash 
0
source

Source: https://habr.com/ru/post/1233563/


All Articles