Python regex to remove duplicate words

I am a very new Python

I want to change the sentence if there are duplicate words.

Right

  • Ex. "it's just so good" → "it's just so nice"
  • Ex. "it just is" → "it just is"

Now I am using this reg. but it all changes to letters. Ex. "My friend and I am happy" → "My friend and I am happy" (he deletes the "i" and the space) ERROR

text = re.sub(r'(\w+)\1', r'\1', text) #remove duplicated words in row 

How can I make the same change, but instead of letters it should check the words?

+4
source share
2 answers
 text = re.sub(r'\b(\w+)( \1\b)+', r'\1', text) #remove duplicated words in row 

\b matches an empty string, but only at the beginning or end of a word.

+4
source

Non-progressive solution using itertools.groupby :

 >>> strs = "this is just is is" >>> from itertools import groupby >>> " ".join([k for k,v in groupby(strs.split())]) 'this is just is' >>> strs = "this just so so so nice" >>> " ".join([k for k,v in groupby(strs.split())]) 'this just so nice' 
+7
source

Source: https://habr.com/ru/post/1487493/


All Articles