Python: how to find consecutive pairs of letters by regular expression?

Question

Python: how to find consecutive pairs of letters by regular expression?

I want to find words that have consecutive pairs of letters using regex. I know only one consecutive pair, such as a zoo (oo), puzzle (zz), arrangement (rr) , it can be achieved on '(\w){2}' . But what about

two consecutive pairs: committee (ttee)
three consecutive pairs: accountant (ookkee)

edit:

'(\w){2}' is actually wrong, it finds any two letters instead of a pair with two letters.
I intend to find words that have pairs of letters, not pairs.
By "consecutive," I mean that there is no other letter between the letters.

+6

python regex

Chuntao lu Jul 10 '13 at 0:43

source share

4 answers

Use re.finditer

 >>> [m.group() for m in re.finditer(r'((\w)\2)+', 'zoo')] ['oo'] >>> [m.group() for m in re.finditer(r'((\w)\2)+', 'arrange')] ['rr'] >>> [m.group() for m in re.finditer(r'((\w)\2)+', 'committee')] ['mm', 'ttee'] >>> [m.group() for m in re.finditer(r'((\w)\2)+', 'bookkeeper')] ['ookkee']

Check if the string contains a sequential pair:

 >>> bool(re.search(r'((\w)\2){2}', 'zoo')) False >>> bool(re.search(r'((\w)\2){2}', 'arrange')) False >>> bool(re.search(r'((\w)\2){2}', 'committee')) True >>> bool(re.search(r'((\w)\2){2}', 'bookkeeper')) True

You can also use the following version of non-capture ( ?: :

 (?:(\w)\1){2}

+14

falsetru Jul 10 '13 at 0:54

source share

Since you mentioned that you want to check from the list, I answered as such. Using falsetru answer:

 newlist = [] for word in list: if [m.group() for m in re.finditer(r'((\w)\2)+', word)] != []: newlist.append(word) print newlist

0

tekknolagi Jul 10 '13 at 4:27

source share

To detect 2 or more consecutive letters, the regular expression becomes: (\w)\1+

0

ankostis Jul 23 '14 at 11:08

source share

Casimir et Hippolyte · Accepted Answer · 2013-07-10T00:48:54+0000

You can use this template:

 [az]*([az])\1([az])\2[az]*

The idea is to use backlinks \1 and \2 , which belong to capture groups.

Note that (\w){2} matches two characters of the word, but not the same character.

Python: how to find consecutive pairs of letters by regular expression?

More articles: