Python: remove duplicates for a specific item from a list.

Question

Python: remove duplicates for a specific item from a list.

I have a list of items where I want to remove the appearance of any duplicates for one item, but keep the remaining duplicates for the rest. That is, I start with the following list

mylist = [4, 1, 2, 6, 1, 0, 9, 8, 0, 9]

I want to remove any duplicates 0 , but keep duplicates 1 and 9 . My current solution is as follows:

 mylist = [i for i in mylist if i != 0] mylist.add(0)

Is there a good way to keep one occurrence 0 apart from the next?

 for i in mylist: if mylist.count(0) > 1: mylist.remove(0)

The second approach takes more than twice as long for this example.

Clarification:

I am not currently interested in the order of the items in the list, as I am currently sorting it after creating and clearing it, but this may change later.
currently I only need to remove duplicates for one specific item (i.e. 0 in my example)

+5

python list python-3.x

Cryn Apr 7 '18 at 12:29

source share

10 answers

Jean-François Fabre · Answer 1 · 2018-04-07T12:39:43+0000

Decision:

 [0] + [i for i in mylist if i]

looks good enough unless 0 is on mylist , in which case you mistakenly add 0.

Also, adding 2 lists like this is not very good. I would do:

 newlist = [i for i in mylist if i] if len(newlist) != len(mylist): # 0 was removed, add it back newlist.append(0)

(or using the filter newlist = list(filter(None,mylist)) , which may be a little faster because there are no python built-in loops)

Adding to the list at the last position is very effective (the list object uses pre-allocation and does not copy memory most of the time). The triple check length is O(1) and avoids checking 0 in mylist

jpp · Answer 2 · 2018-04-07T12:35:51+0000

If performance is a problem and you are happy to use a third-party library, use numpy .

The Python standard library is great for many things. Computing on numeric arrays is not one of them.

 import numpy as np mylist = np.array([4, 1, 2, 6, 1, 0, 9, 8, 0, 9]) mylist = np.delete(mylist, np.where(mylist == 0)[0][1:]) # array([4, 1, 2, 6, 1, 0, 9, 8, 9])

Here, the first argument to np.delete is the input array. The second argument retrieves the indices of all occurrences of 0, then retrieves the second instance forward.

Benchmark Performance

Tested on Python 3.6.2 / Numpy 1.13.1. Performance will be system and array specific.

 %timeit jp(myarr.copy()) # 183 µs %timeit vui(mylist.copy()) # 393 µs %timeit original(mylist.copy()) # 1.85 s import numpy as np from collections import Counter myarr = np.array([4, 1, 2, 6, 1, 0, 9, 8, 0, 9] * 1000) mylist = [4, 1, 2, 6, 1, 0, 9, 8, 0, 9] * 1000 def jp(myarr): return np.delete(myarr, np.where(myarr == 0)[0][1:]) def vui(mylist): return [0] + list(filter(None, mylist)) def original(mylist): for i in mylist: if mylist.count(0) > 1: mylist.remove(0) return mylist

Daniel Pryden · Answer 3 · 2018-04-07T12:40:04+0000

It looks like it's better for you to use the collections.Counter data structure (which is in the standard library):

 import collections counts = collections.Counter(mylist) counts[0] = 1 mylist = list(counts.elements())

Ajay · Answer 4 · 2018-04-07T13:30:09+0000

Cutting must be done

 a[start:end] # items start through end-1 a[start:] # items start through the rest of the list a[:end] # items from the beginning through end-1 a[:] # a copy of the whole list

Input:

 mylist = [4,1, 2, 6, 1, 0, 9, 8, 0, 9,0,0,9,2,2,] pos=mylist.index(0) nl=mylist[:pos+1]+[i for i in mylist[pos+1:] if i!=0] print(nl)

Conclusion: [4, 1, 2, 6, 1, 0, 9, 8, 9, 9, 2, 2]

mehrdad-pedramfar · Answer 5 · 2018-04-07T13:54:06+0000

You can use this:

 desired_value = 0 mylist = [i for i in mylist if i!=desired_value] + [desired_value]

Now you can change the desired value, you can also do it as a list like this

 desired_value = [0, 6] mylist = [i for i in mylist if i not in desired_value] + desired_value

Vuillemot florian · Answer 6 · 2018-04-07T12:46:55+0000

Perhaps you can use filter .

 [0] + list(filter(lambda x: x != 0, mylist))

Kasramvd · Answer 7 · 2018-04-07T12:58:17+0000

Here is a generator-based approach with approximately O (n) complexity, which also preserves the order of the original list:

 In [62]: def remove_dup(lst, item): ...: temp = [item] ...: for i in lst: ...: if i != item: ...: yield i ...: elif i == item and temp: ...: yield temp.pop() ...: In [63]: list(remove_dup(mylist, 0)) Out[63]: [4, 1, 2, 6, 1, 0, 9, 8, 9]

Also, if you are dealing with larger lists, you can use the following vectorized and optimized approach using Numpy:

 In [80]: arr = np.array([4, 1, 2, 6, 1, 0, 9, 8, 0, 9]) In [81]: mask = arr == 0 In [82]: first_ind = np.where(mask)[0][0] In [83]: mask[first_ind] = False In [84]: arr[~mask] Out[84]: array([4, 1, 2, 6, 1, 0, 9, 8, 9])

Yasin yousif · Answer 8 · 2018-04-07T13:11:14+0000

here is online:

 [x for i,x in enumerate(mylist) if mylist.index(x)==i or x!=0]

Result

 [4, 1, 2, 6, 1, 0, 9, 8, 9]

Ajax1234 · Answer 9 · 2018-04-07T14:00:50+0000

You can use enumerate :

 def remove(l, d): return [a for i, a in enumerate(l) if a != d or a not in l[:i]] print(remove([4, 1, 2, 6, 1, 0, 9, 8, 0, 9], 0))

Output:

 [4, 1, 2, 6, 1, 0, 9, 8, 9]

Thierry lathuille · Answer 10 · 2018-04-07T14:33:56+0000

You can use itertools.count counter which will return 0, 1, ... every time it repeats:

 from itertools import count mylist = [4, 1, 2, 6, 1, 0, 9, 8, 0, 9] counter = count() # next(counter) will be called each time i == 0 # it will return 0 the first time, so only the first time # will 'not next(counter)' be True out = [i for i in mylist if i != 0 or not next(counter)] print(out) # [4, 1, 2, 6, 1, 0, 9, 8, 9]

The order is preserved and can be easily changed to deduplicate an arbitrary number of values:

 from itertools import count mylist = [4, 1, 2, 6, 1, 0, 9, 8, 0, 9] items_to_dedup = {1, 0} counter = {item: count() for item in items_to_dedup} out = [i for i in mylist if i not in items_to_dedup or not next(counter[i])] print(out) # [4, 1, 2, 6, 0, 9, 8, 9]

Python: remove duplicates for a specific item from a list.

More articles: