How to capture the entire string when using "lookaround" with regex characters?

I need to find all lines that consist only of the letters "a" and "b", and each instance of "a" immediately follows "b" and immediately precedes "b".

For instance:

mystring = 'bab babab babbab ab baba aba xyz' 

Then my regex should return:

 ['bab' 'babab' 'babbab'] 

(The line "ab" - "a" is not preceded by "b". Similarly, for "aba" and "xyz" do not consist only of "a", "b".)

I used lookahead for this and wrote this regex:

 re.findall(r'((?<=b)a(?=b))',mystring) 

But this only returns to me all instances of "a" followed / preceded by "b":

 ['a','a','a','a'] 

But I need whole words. How can I find whole words using regular expression? I tried changing my regex with various parameters, but nothing works. How can I do that?

+5
source share
2 answers

You can use the following regular expression:

 >>> re.findall(r'\b(?:b+a)+b+\b',mystring) ['bab', 'babab', 'babbab'] 

Regular expression visualization

Demo version of Debuggex

As you can see from the previous diagram, this regular expression will correspond to any combination ba (which b can be represented more than once) that produce words that each a precedes b , and then the whole line can follow one or more b .

+3
source

Try

 re.findall(r'(b+ab+)+',mystring); 

if bbbabb also allowed. You do not need to look forward or backward.

Edit: Yes, to match also babab (generic b for a ), etc., It must be

 re.findall(r'(b+a)+b+',mystring); 
0
source

Source: https://habr.com/ru/post/1232866/


All Articles