Regex for repeating words in a string in Python

I have a good regex to replace duplicate characters in a string. But now I also need to replace duplicate words, three or more words will be replaced by two words.

how

bye! bye! bye! 

should become

 bye! bye! 

My code is:

 def replaceThreeOrMoreCharachetrsWithTwoCharacters(string): # pattern to look for three or more repetitions of any character, including newlines. pattern = re.compile(r"(.)\1{2,}", re.DOTALL) return pattern.sub(r"\1\1", string) 
+5
source share
5 answers

Assuming that in your requirements the word β€œword” is one or more characters without spaces, surrounded by spaces or string constraints, you can try this pattern:

 re.sub(r'(?<!\S)((\S+)(?:\s+\2))(?:\s+\2)+(?!\S)', r'\1', s) 
+4
source

You can also try the following regular expression,

 (?<= |^)(\S+)(?: \1){2,}(?= |$) 

Code example

 >>> import regex >>> s = "hi hi hi hi some words words words which'll repeat repeat repeat repeat repeat" >>> m = regex.sub(r'(?<= |^)(\S+)(?: \1){2,}(?= |$)', r'\1 \1', s) >>> m "hi hi some words words which'll repeat repeat" 

Demo

+3
source

I know that you are after regex, but you can use a simple loop to achieve the same:

 def max_repeats(s, max=2): last = '' out = [] for word in s.split(): same = 0 if word != last else same + 1 if same < max: out.append(word) last = word return ' '.join(out) 

As a bonus, I allowed to specify a different maximum number of repetitions (default 2). If there is more than one space between each word, it will be lost. It is up to you whether you think this is a bug or function :)

+2
source

Try the following:

 import re s = your string s = re.sub( r'(\S+) (?:\1 ?){2,}', r'\1 \1', s ) 

Here you can see a sample code: http://codepad.org/YyS9JCLO

+1
source
 def replaceThreeOrMoreWordsWithTwoWords(string): # Pattern to look for three or more repetitions of any words. pattern = re.compile(r"(?<!\S)((\S+)(?:\s+\2))(?:\s+\2)+(?!\S)", re.DOTALL) return pattern.sub(r"\1", string) 
0
source

Source: https://habr.com/ru/post/1201036/


All Articles