Return words with double consecutive letters

I am trying to get a python program to return words that have double consecutive letters (e.g. door, ball, floor). My code so far is this: it returns me all the words in the files, divided by two letters:

def text_processing( file_location ): import re file_variable = open( file_location ) lines = file_variable.read() print lines double_letter = re.compile('[AZ]{2,2}', re.IGNORECASE) double_letter_list = double_letter.findall(lines) print double_letter_list 
+2
source share
4 answers

You can try the following:

 def text_processing( file_location ): import re file_variable = open( file_location ) lines = file_variable.readlines() double_letter = re.compile(r'.*(.)\1.*', re.IGNORECASE) double_letter_list = [] for line in lines: for word in line.split(" "): match = double_letter.match(word) if match: double_letter_list.append(match.group()) print double_letter_list 

He tries to match the pattern with every word in the file, and if it is a match, he adds it to the list of double words.

+1
source

Try this regex: r"\w*(\w)\1\w*"

+6
source
 re.findall('(\w*(\w)\\2\w*)', file_variable.read()) 

Will return a list of tuples (word, repeating letter), you can just take all the first elements.

Example:

 >>> re.findall('(\w*(\w)\\2\w*)', 'some words here: boo, shoo, wooooo, etc.') [('boo', 'o'), ('shoo', 'o'), ('wooooo', 'o')] 
+2
source

I think you have a problem in the regular expression, try a pattern like r'(.)\1' instead (this will match any character in the first group in brackets, and then the same character will repeat).

You should also take care of closing the file descriptor, which means reading in lines using the context manager:

 with open(file_location) as f: lines = f.read() 

 >>> with open('/usr/share/dict/words') as f: ... lines = [l.strip() for l in f.readlines()] ... >>> import re >>> for line in lines: ... if re.findall(r'([az])\1', line.lower()): ... print line ... Aachen Aachen's Aaliyah Aaliyah's Aaron Aaron's Abbas Abbasid Abbasid's Abbott Abbott's Abby Abby's Aberdeen Aberdeen's Abyssinia Abyssinia's Abyssinian Accra Accra's Achilles Acuff ... 
0
source

Source: https://habr.com/ru/post/949075/


All Articles