How to remove duplicated consecutive characters and reserve the first one using regular expression?

Question

How to remove duplicated consecutive characters and reserve the first one using regular expression?

I found a code snippet to remove duplicated consecutive characters and reserve the first character in Python using regular expressions from a website, for example:

import re
re.sub(r'(?s)(.)(?=.*\1)','','aabbcc')  #'abc'

But there is a defect that if the line is "aabbccaabb", it ignores the first "aa", "bb" and produces "cab".

re.sub(r'(?s)(.)(?=.*\1)','','aabbccaabb')  #'cab'

Is there a way to solve it with regex?

+4

python string regex

Haohu shen Jan 14 '17 at 14:03

source share

2 answers

, , :

s='aabbccaabb'
print("".join([c for i,c in enumerate(s) if i==0 or s[i-1]!=c]))

+4

Jean-François Fabre 14 . '17 14:08

MYGz · Accepted Answer · 2017-01-14T14:06:42+0000

Just delete .*in a positive order.

import re

print re.sub(r'(?s)(.)(?=\1)','','aabbcc')
print re.sub(r'(?s)(.)(?=\1)','','aabbccaabb')

Conclusion:

abc
abcab

How to remove duplicated consecutive characters and reserve the first one using regular expression?

More articles: