How to capture the entire string when using "lookaround" with regex characters?

Question

How to capture the entire string when using "lookaround" with regex characters?

I need to find all lines that consist only of the letters "a" and "b", and each instance of "a" immediately follows "b" and immediately precedes "b".

For instance:

mystring = 'bab babab babbab ab baba aba xyz'

Then my regex should return:

 ['bab' 'babab' 'babbab']

(The line "ab" - "a" is not preceded by "b". Similarly, for "aba" and "xyz" do not consist only of "a", "b".)

I used lookahead for this and wrote this regex:

 re.findall(r'((?<=b)a(?=b))',mystring)

But this only returns to me all instances of "a" followed / preceded by "b":

 ['a','a','a','a']

But I need whole words. How can I find whole words using regular expression? I tried changing my regex with various parameters, but nothing works. How can I do that?

+5

python regex regex-group regex-lookarounds state-machines

Karthik elango Oct 3 '15 at 19:25

source share

2 answers

Try

 re.findall(r'(b+ab+)+',mystring);

if bbbabb also allowed. You do not need to look forward or backward.

Edit: Yes, to match also babab (generic b for a ), etc., It must be

 re.findall(r'(b+a)+b+',mystring);

0

dev.null Oct 3 '15 at 19:39

source share

Kasramvd · Accepted Answer · 2015-10-03T19:43:31+0000

You can use the following regular expression:

 >>> re.findall(r'\b(?:b+a)+b+\b',mystring) ['bab', 'babab', 'babbab']

Demo version of Debuggex

As you can see from the previous diagram, this regular expression will correspond to any combination ba (which b can be represented more than once) that produce words that each a precedes b , and then the whole line can follow one or more b .

How to capture the entire string when using "lookaround" with regex characters?

More articles: