Decision without itertools.groupby() :
p = ['**', 'foo', '*', 'bar', 'bar', '**', '**', '**', 'baz', '**', '**', 'foo', '*','*', 'bar', 'bar','bar', '**', '**','foo','bar',] def treat(A): prec = A[0]; yield prec for x in A[1:]: if (prec,x)!=('**','**'): yield x prec = x print p print print list(treat(p))
result
['**', 'foo', '*', 'bar', 'bar', '**', '**', '**', 'baz', '**', '**', 'foo', '*', '*', 'bar', 'bar','bar', '**', '**', 'foo', 'bar'] ['**', 'foo', '*', 'bar', 'bar', '**', 'baz', '**', 'foo', '*', '*', 'bar', 'bar', 'bar', '**', 'foo', 'bar']
Another Dougress-inspired solution
from itertools import groupby p = ['**', 'foo', '*', 'bar', 'bar', '**', '**', '**', 'baz', '**', '**', 'foo', '*','*', 'bar', 'bar','bar', '**', '**','foo','bar',] res = [] for k, g in groupby(p): res.extend( ['**'] if k=='**' else list(g) ) print res
This is similar to Tom Zych's solution, but simpler
.
EDIT
p = ['**','**', 'foo', '*', 'bar', 'bar', '**', '**', '**', 'baz', '**', '**', 'foo', '*','*', 'bar', 'bar','bar', '**', '**','foo','bar', '**', '**', '**'] q= ['**',12,'**',45, 'foo',78, '*',751, 'bar',4789, 'bar',3, '**', 5,'**',7, '**', 73,'baz',4, '**',8, '**',20,'foo', 8,'*',36,'*', 36,'bar', 11,'bar',0,'bar',9, '**', 78,'**',21,'foo',27,'bar',355, '**',33, '**',37, '**','end'] def treat(B,dedupl): B = iter(B) prec = B.next(); yield prec for x in B: if not(prec==x==dedupl): yield x prec = x print 'gen = ( x for x in q[::2])' gen = ( x for x in q[::2]) print 'list(gen)==p is ',list(gen)==p gen = ( x for x in q[::2]) print 'list(treat(gen)==',list(treat(gen,'**')) ch = '??h4i4???4t4y?45l????hmo4j5???' print '\nch==',ch print "''.join(treat(ch,'?'))==",''.join(treat(ch,'?')) print "\nlist(treat([],'%%'))==",list(treat([],'%%'))
result
gen = ( x for x in q[::2]) list(gen)==p is True list(treat(gen)== ['**', 'foo', '*', 'bar', 'bar', '**', 'baz', '**', 'foo', '*', '*', 'bar', 'bar', 'bar', '**', 'foo', 'bar', '**'] ch== ??h4i4???4t4y?45l????hmo4j5??? ''.join(treat(ch,'?'))== ?h4i4?4t4y?45l?hmo4j5? list(treat([],'%%'))== []
.
Note. The generator function allows you to adapt the output to the type of input with a record around the generator call; it does not require changing the internal code of the genrator function;
This is not the case with Tom Zynch's solution, which cannot be so easily adapted to the input type.
.
EDIT 2
I was looking for a single line method, with a list or expression of a generator.
I found ways to do this, I think it is impossible to do without groupby ()
from itertools import groupby from operator import concat p = ['**', '**','foo', '*', 'bar', 'bar', '**', '**', '**', 'bar','**','foo','sun','sun','sun'] print 'p==',p,'\n' dedupl = ("**",'sun') print 'dedupl==',repr(dedupl) print [ x for k, g in groupby(p) for x in ((k,) if k in dedupl else g) ]
Based on the same principle, it is easy to convert the dugres function to a generator function:
from itertools import groupby def compress(iterable, to_compress): for k, g in groupby(iterable): if k in to_compress: yield k else: for x in g: yield x
However, this generator function has two drawbacks:
he calls the groupby () function, which is not so easy to understand for someone not used for Python
its execution time is longer than my generator function treat () and generator function John Machin, which do not use groupby () .
I modified them a bit to make them accept a sequence of elements to be deduplicated, and I measured the execution time:
from time import clock from itertools import groupby def squeeze(iterable, victims, _dummy=object()): if hasattr(iterable, '__iter__') and not hasattr(victims, '__iter__'): victims = (victims,) previous = _dummy for item in iterable: if item in victims and item==previous: continue previous = item yield item def treat(B,victims): if hasattr(B, '__iter__') and not hasattr(victims, '__iter__'): victims = (victims,) B = iter(B) prec = B.next(); yield prec for x in B: if x not in victims or x!=prec: yield x prec = x def compress(iterable, to_compress): if hasattr(iterable, '__iter__') and not hasattr(to_compress, '__iter__'): to_compress = (to_compress,) for k, g in groupby(iterable): if k in to_compress: yield k else: for x in g: yield x p = ['**', '**','su','foo', '*', 'bar', 'bar', '**', '**', '**', 'su','su','**','bin', '*','*','bar','bar','su','su','su'] n = 10000 te = clock() for i in xrange(n): a = list(compress(p,('**','sun'))) print clock()-te,' generator function with groupby()' te = clock() for i in xrange(n): b = list(treat(p,('**','sun'))) print clock()-te,' generator function eyquem' te = clock() for i in xrange(n): c = list(squeeze(p,('**','sun'))) print clock()-te,' generator function John Machin' print p print 'a==b==c is ',a==b==c print a
Instruction manual
if hasattr(iterable, '__iter__') and not hasattr(to_compress, '__iter__'): to_compress = (to_compress,)
errors must be avoided when the iterable argument is a sequence and the other argument is only one line: this last must then be changed into a container, provided that the iterated argument is not the line itself.
It is based on the fact that sequences like tuples, lists, stes ... have an iter method, but strings don't. The following code shows the problems:
def compress(iterable, to_compress): if hasattr(iterable, '__iter__') and not hasattr( to_compress, '__iter__'): to_compress = (to_compress,) print 't_compress==',repr(to_compress) for k, g in groupby(iterable): if k in to_compress: yield k else: for x in g: yield x def compress_bof(iterable, to_compress): if not hasattr(to_compress, '__iter__'): # to_compress is a string to_compress = (to_compress,) print 't_compress==',repr(to_compress) for k, g in groupby(iterable): if k in to_compress: yield k else: for x in g: yield x def compress_bug(iterable, to_compress_bug): print 't_compress==',repr(to_compress_bug) for k, g in groupby(iterable): #print 'k==',k,k in to_compress_bug if k in to_compress_bug: yield k else: for x in g: yield x q = ';;;htr56;but78;;;;$$$$;ios4!' print 'q==',q dedupl = ";$" print 'dedupl==',repr(dedupl) print print "''.join(compress (q,"+repr(dedupl)+")) :\n",''.join(compress (q,dedupl))+\ ' <-CORRECT ONE' print print "''.join(compress_bof(q,"+repr(dedupl)+")) :\n",''.join(compress_bof(q,dedupl))+\ ' <====== error ====' print print "''.join(compress_bug(q,"+repr(dedupl)+")) :\n",''.join(compress_bug(q,dedupl)) print '\n\n\n' q = [';$', ';$',';$','foo', ';', 'bar','bar',';',';',';','$','$','foo',';$12',';$12'] print 'q==',q dedupl = ";$12" print 'dedupl==',repr(dedupl) print print 'list(compress (q,'+repr(dedupl)+')) :\n',list(compress (q,dedupl)),\ ' <-CORRECT ONE' print print 'list(compress_bof(q,'+repr(dedupl)+')) :\n',list(compress_bof(q,dedupl)) print print 'list(compress_bug(q,'+repr(dedupl)+')) :\n',list(compress_bug(q,dedupl)),\ ' <====== error ====' print
result
q== ;;;htr56;but78;;;;$$$$;ios4! dedupl== ';$' ''.join(compress (q,';$')) : t_compress== ';$' ;htr56;but78;$;ios4! <-CORRECT ONE ''.join(compress_bof(q,';$')) : t_compress== (';$',) ;;;htr56;but78;;;;$$$$;ios4! <====== error ==== ''.join(compress_bug(q,';$')) : t_compress== ';$' ;htr56;but78;$;ios4! q== [';$', ';$', ';$', 'foo', ';', 'bar', 'bar', ';', ';', ';', '$', '$', 'foo', ';$12', ';$12'] dedupl== ';$12' list(compress (q,';$12')) : t_compress== (';$12',) [';$', ';$', ';$', 'foo', ';', 'bar', 'bar', ';', ';', ';', '$', '$', 'foo', ';$12'] <-CORRECT ONE list(compress_bof(q,';$12')) : t_compress== (';$12',) [';$', ';$', ';$', 'foo', ';', 'bar', 'bar', ';', ';', ';', '$', '$', 'foo', ';$12'] list(compress_bug(q,';$12')) : t_compress== ';$12' [';$', 'foo', ';', 'bar', 'bar', ';', '$', 'foo', ';$12'] <====== error ====
I got the following runtime:
0.390163274941 generator function with groupby() 0.324547114228 generator function eyquem 0.310176572721 generator function John Machin ['**', '**', 'su', 'foo', '*', 'bar', 'bar', '**', '**', '**', 'su', 'su', '**', 'bin', '*', '*', 'bar', 'bar', 'su', 'su', 'su'] a==b==c is True ['**', 'su', 'foo', '*', 'bar', 'bar', '**', 'su', 'su', '**', 'bin', '*', '*', 'bar', 'bar', 'su', 'su', 'su']
I prefer John Machin's solution because there is no B = iter (B) instruction like mine.
But the previous = _dummy with _dummy = object() seems strange to me. Therefore, I finally think that the best solution is the following code, which works even with a string as an iterative argument and in which the first previously defined object is not fake:
def squeeze(iterable, victims): if hasattr(iterable, '__iter__') and not hasattr(victims, '__iter__'): victims = (victims,) for item in iterable: previous = item break for item in iterable: if item in victims and item==previous: continue previous = item yield item
.
EDIT 3
I had undesrtood that object () was used as a controller.
But I was puzzled that the object was called. Yesterday I thought that the object is something so peculiar that it is impossible for the object to be passed in any iterable as the squeeze () argument. So, I wondered why you called it, John Machin, and it settled doubts in my mind about its nature; so I ask you to confirm that the object is a super-metaclass.
But today, I think I understand why object is being called in your code.
In fact, it is possible that the object will be iterable, and why not? A super-metaclass object is an object, so nothing prevents it from being placed into iterability before deduplication is handled on iterable, who knows? Then using the object as a sentry is a bad practice.
.
Thus, you did not use the object , but the object () instance as a sentinel.
But I wondered why to choose this mysterious thing to return a call to an object ?
My thoughts continued on this, and I noticed something that should be the reason for this call:
calling the object creates an instance, since object is the most basic class in Python and every time an instance is created, it is a different object from any previously created instance with a value always different from the value of any previous instance of the object :
a = object() b = object() c = object() d = object() print id(a),'\n',id(b),'\n',id(c),'\n',id(d) print a==b,a==c,a==d print b==c,b==d,c==d
result
10818752 10818760 10818768 10818776 False False False False False False
Thus, we are sure that _dummy=object() is a unique object that has a unique id and unique value. By the way, I wonder what the value of the object instance is. In any case, the following code shows a problem with _dummy=object and no problem with _dummy=object()
def imperfect_squeeze(iterable, victim, _dummy=object): previous = _dummy print 'id(previous) ==',id(previous) print 'id(iterable[0])==',id(iterable[0]) for item in iterable: if item in victim and item==previous: continue previous = item; yield item def squeeze(iterable, victim, _dummy=object()): previous = _dummy print 'id(previous) ==',id(previous) print 'id(iterable[0])==',id(iterable[0]) for item in iterable: if item in victim and item==previous: continue previous = item; yield item wat = object li = [wat,'**','**','foo',wat,wat] print 'imperfect_squeeze\n''li before ==',li print map(id,li) li = list(imperfect_squeeze(li,[wat,'**'])) print 'li after ==',li print wat = object() li = [wat,'**','**','foo',wat,wat] print 'squeeze\n''li before ==',li print map(id,li) li = list(squeeze(li,[wat,'**'])) print 'li after ==',li print li = [object(),'**','**','foo',object(),object()] print 'squeeze\n''li before ==',li print map(id,li) li = list(squeeze(li,[li[0],'**'])) print 'li after ==',li
result
imperfect_squeeze li before == [<type 'object'>, '**', '**', 'foo', <type 'object'>, <type 'object'>] [505317320, 18578968, 18578968, 13208848, 505317320, 505317320] id(previous) == 505317320 id(iterable[0])== 505317320 li after == ['**', 'foo', <type 'object'>] squeeze li before == [<object object at 0x00A514C8>, '**', '**', 'foo', <object object at 0x00A514C8>, <object object at 0x00A514C8>] [10818760, 18578968, 18578968, 13208848, 10818760, 10818760] id(previous) == 10818752 id(iterable[0])== 10818760 li after == [<object object at 0x00A514C8>, '**', 'foo', <object object at 0x00A514C8>] squeeze li before == [<object object at 0x00A514D0>, '**', '**', 'foo', <object object at 0x00A514D8>, <object object at 0x00A514E0>] [10818768, 18578968, 18578968, 13208848, 10818776, 10818784] id(previous) == 10818752 id(iterable[0])== 10818768 li after == [<object object at 0x00A514D0>, '**', 'foo', <object object at 0x00A514D8>, <object object at 0x00A514E0>]
The problem is the lack of <type 'object'> as the first element of the list after processing with imperfect_squeeze () .
However, we should note that a βproblemβ is only possible with a list whose FIRST element is an object : a lot of thought about such a tiny probability ... but a strict encoder takes everything into account.
If we resort to a list , instead of an object , the results are different:
def imperfect_sqlize(iterable, victim, _dummy=list): previous = _dummy print 'id(previous) ==',id(previous) print 'id(iterable[0])==',id(iterable[0]) for item in iterable: if item in victim and item==previous: continue previous = item; yield item def sqlize(iterable, victim, _dummy=list()): previous = _dummy print 'id(previous) ==',id(previous) print 'id(iterable[0])==',id(iterable[0]) for item in iterable: if item in victim and item==previous: continue previous = item; yield item wat = list li = [wat,'**','**','foo',wat,wat] print 'imperfect_sqlize\n''li before ==',li print map(id,li) li = list(imperfect_sqlize(li,[wat,'**'])) print 'li after ==',li print wat = list() li = [wat,'**','**','foo',wat,wat] print 'sqlize\n''li before ==',li print map(id,li) li = list(sqlize(li,[wat,'**'])) print 'li after ==',li print li = [list(),'**','**','foo',list(),list()] print 'sqlize\n''li before ==',li print map(id,li) li = list(sqlize(li,[li[0],'**'])) print 'li after ==',li
result
imperfect_sqlize li before == [<type 'list'>, '**', '**', 'foo', <type 'list'>, <type 'list'>] [505343304, 18578968, 18578968, 13208848, 505343304, 505343304] id(previous) == 505343304 id(iterable[0])== 505343304 li after == ['**', 'foo', <type 'list'>] sqlize li before == [[], '**', '**', 'foo', [], []] [18734936, 18578968, 18578968, 13208848, 18734936, 18734936] id(previous) == 18734656 id(iterable[0])== 18734936 li after == ['**', 'foo', []] sqlize li before == [[], '**', '**', 'foo', [], []] [18734696, 18578968, 18578968, 13208848, 18735016, 18734816] id(previous) == 18734656 id(iterable[0])== 18734696 li after == ['**', 'foo', []]
Is there any other object besides the object in Python that has this feature?
John Machin, why did you choose an instance as a reference point in your function generator? Do you already know this feature?