Python: how to find consecutive pairs of letters by regular expression?

I want to find words that have consecutive pairs of letters using regex. I know only one consecutive pair, such as a zoo (oo), puzzle (zz), arrangement (rr) , it can be achieved on '(\w){2}' . But what about

  • two consecutive pairs: committee (ttee)
  • three consecutive pairs: accountant (ookkee)

edit:

  • '(\w){2}' is actually wrong, it finds any two letters instead of a pair with two letters.
  • I intend to find words that have pairs of letters, not pairs.
  • By "consecutive," I mean that there is no other letter between the letters.
+6
source share
4 answers

You can use this template:

 [az]*([az])\1([az])\2[az]* 

The idea is to use backlinks \1 and \2 , which belong to capture groups.

Note that (\w){2} matches two characters of the word, but not the same character.

+4
source

Use re.finditer

 >>> [m.group() for m in re.finditer(r'((\w)\2)+', 'zoo')] ['oo'] >>> [m.group() for m in re.finditer(r'((\w)\2)+', 'arrange')] ['rr'] >>> [m.group() for m in re.finditer(r'((\w)\2)+', 'committee')] ['mm', 'ttee'] >>> [m.group() for m in re.finditer(r'((\w)\2)+', 'bookkeeper')] ['ookkee'] 

Check if the string contains a sequential pair:

 >>> bool(re.search(r'((\w)\2){2}', 'zoo')) False >>> bool(re.search(r'((\w)\2){2}', 'arrange')) False >>> bool(re.search(r'((\w)\2){2}', 'committee')) True >>> bool(re.search(r'((\w)\2){2}', 'bookkeeper')) True 

You can also use the following version of non-capture ( ?: :

 (?:(\w)\1){2} 
+14
source

Since you mentioned that you want to check from the list, I answered as such. Using falsetru answer:

 newlist = [] for word in list: if [m.group() for m in re.finditer(r'((\w)\2)+', word)] != []: newlist.append(word) print newlist 
0
source

To detect 2 or more consecutive letters, the regular expression becomes: (\w)\1+

0
source

Source: https://habr.com/ru/post/949072/


All Articles