Return words with double consecutive letters

Question

Return words with double consecutive letters

I am trying to get a python program to return words that have double consecutive letters (e.g. door, ball, floor). My code so far is this: it returns me all the words in the files, divided by two letters:

def text_processing( file_location ): import re file_variable = open( file_location ) lines = file_variable.read() print lines double_letter = re.compile('[AZ]{2,2}', re.IGNORECASE) double_letter_list = double_letter.findall(lines) print double_letter_list

+2

python regex

italianfoot Mar 6 '12 at 23:35

source share

4 answers

Try this regex: r"\w*(\w)\1\w*"

+6

user1096188 Mar 6 '12 at 23:51

source share

 re.findall('(\w*(\w)\\2\w*)', file_variable.read())

Will return a list of tuples (word, repeating letter), you can just take all the first elements.

Example:

 >>> re.findall('(\w*(\w)\\2\w*)', 'some words here: boo, shoo, wooooo, etc.') [('boo', 'o'), ('shoo', 'o'), ('wooooo', 'o')]

+2

campos.ddc Mar 07 '12 at 0:10

source share

I think you have a problem in the regular expression, try a pattern like r'(.)\1' instead (this will match any character in the first group in brackets, and then the same character will repeat).

You should also take care of closing the file descriptor, which means reading in lines using the context manager:

 with open(file_location) as f: lines = f.read()

 >>> with open('/usr/share/dict/words') as f: ... lines = [l.strip() for l in f.readlines()] ... >>> import re >>> for line in lines: ... if re.findall(r'([az])\1', line.lower()): ... print line ... Aachen Aachen's Aaliyah Aaliyah's Aaron Aaron's Abbas Abbasid Abbasid's Abbott Abbott's Abby Abby's Aberdeen Aberdeen's Abyssinia Abyssinia's Abyssinian Accra Accra's Achilles Acuff ...

0

wim Mar 6 '12 at 23:46

source share

veiset · Accepted Answer · 2012-03-06T23:57:14+0000

You can try the following:

 def text_processing( file_location ): import re file_variable = open( file_location ) lines = file_variable.readlines() double_letter = re.compile(r'.*(.)\1.*', re.IGNORECASE) double_letter_list = [] for line in lines: for word in line.split(" "): match = double_letter.match(word) if match: double_letter_list.append(match.group()) print double_letter_list

He tries to match the pattern with every word in the file, and if it is a match, he adds it to the list of double words.

Return words with double consecutive letters

More articles: