How to remove duplicated consecutive characters and reserve the first one using regular expression?

I found a code snippet to remove duplicated consecutive characters and reserve the first character in Python using regular expressions from a website, for example:

import re
re.sub(r'(?s)(.)(?=.*\1)','','aabbcc')  #'abc'

But there is a defect that if the line is "aabbccaabb", it ignores the first "aa", "bb" and produces "cab".

re.sub(r'(?s)(.)(?=.*\1)','','aabbccaabb')  #'cab'

Is there a way to solve it with regex?

+4
source share
2 answers

Just delete .*in a positive order.

import re

print re.sub(r'(?s)(.)(?=\1)','','aabbcc')
print re.sub(r'(?s)(.)(?=\1)','','aabbccaabb')

Conclusion:

abc
abcab
+2
source

, , :

s='aabbccaabb'
print("".join([c for i,c in enumerate(s) if i==0 or s[i-1]!=c]))
+4

Source: https://habr.com/ru/post/1666781/


All Articles