Perhaps regular expressions are not needed at all.
Using itertools.groupby does the job. It is designed to group equal occurrences of consecutive elements.
- group of words (after splitting into dots)
- convert to list and
tuple value, count only if length> 1
like this:
import itertools s = "My.name.name.is.Inigo.Montoya.You.killed.my.father.father.father.Prepare.to.die" matches = [(l[0],len(l)) for l in (list(v) for k,v in itertools.groupby(s.split("."))) if len(l)>1]
result:
[('name', 2), ('father', 3)]
So basically we can do whatever we want with this list of tuples (for example, to filter its number)
Bonus (since I first misunderstood the question, so I leave it): remove duplicates from the sentence - group by words (after splitting according to dots), as described above - accept only the key (value) of the values returned in the comp list (we values are not needed, since we do not take into account) - join with a point
In one line (still using itertools ):
new_s = ".".join([k for k,_ in itertools.groupby(s.split("."))])
result:
My.name.is.Inigo.Montoya.You.killed.my.father.Prepare.to.die
source share