Python regex to remove duplicate words

Question

Python regex to remove duplicate words

I am a very new Python

I want to change the sentence if there are duplicate words.

Right

Ex. "it's just so good" → "it's just so nice"
Ex. "it just is" → "it just is"

Now I am using this reg. but it all changes to letters. Ex. "My friend and I am happy" → "My friend and I am happy" (he deletes the "i" and the space) ERROR

text = re.sub(r'(\w+)\1', r'\1', text) #remove duplicated words in row

How can I make the same change, but instead of letters it should check the words?

+4

python regex

boje Jun 21 '13 at 15:08

source share

2 answers

Non-progressive solution using itertools.groupby :

 >>> strs = "this is just is is" >>> from itertools import groupby >>> " ".join([k for k,v in groupby(strs.split())]) 'this is just is' >>> strs = "this just so so so nice" >>> " ".join([k for k,v in groupby(strs.split())]) 'this just so nice'

+7

Ashwini chaudhary Jun 21 '13 at 15:10

source share

tom · Accepted Answer · 2013-06-21T15:15:18+0000

 text = re.sub(r'\b(\w+)( \1\b)+', r'\1', text) #remove duplicated words in row

\b matches an empty string, but only at the beginning or end of a word.

Python regex to remove duplicate words

More articles: