Python: remove duplicates for a specific item from a list.

I have a list of items where I want to remove the appearance of any duplicates for one item, but keep the remaining duplicates for the rest. That is, I start with the following list

mylist = [4, 1, 2, 6, 1, 0, 9, 8, 0, 9] 

I want to remove any duplicates 0 , but keep duplicates 1 and 9 . My current solution is as follows:

 mylist = [i for i in mylist if i != 0] mylist.add(0) 

Is there a good way to keep one occurrence 0 apart from the next?

 for i in mylist: if mylist.count(0) > 1: mylist.remove(0) 

The second approach takes more than twice as long for this example.

Clarification:

  • I am not currently interested in the order of the items in the list, as I am currently sorting it after creating and clearing it, but this may change later.

  • currently I only need to remove duplicates for one specific item (i.e. 0 in my example)

+5
source share
10 answers

Decision:

 [0] + [i for i in mylist if i] 

looks good enough unless 0 is on mylist , in which case you mistakenly add 0.

Also, adding 2 lists like this is not very good. I would do:

 newlist = [i for i in mylist if i] if len(newlist) != len(mylist): # 0 was removed, add it back newlist.append(0) 

(or using the filter newlist = list(filter(None,mylist)) , which may be a little faster because there are no python built-in loops)

Adding to the list at the last position is very effective (the list object uses pre-allocation and does not copy memory most of the time). The triple check length is O(1) and avoids checking 0 in mylist

+2
source

If performance is a problem and you are happy to use a third-party library, use numpy .

The Python standard library is great for many things. Computing on numeric arrays is not one of them.

 import numpy as np mylist = np.array([4, 1, 2, 6, 1, 0, 9, 8, 0, 9]) mylist = np.delete(mylist, np.where(mylist == 0)[0][1:]) # array([4, 1, 2, 6, 1, 0, 9, 8, 9]) 

Here, the first argument to np.delete is the input array. The second argument retrieves the indices of all occurrences of 0, then retrieves the second instance forward.

Benchmark Performance

Tested on Python 3.6.2 / Numpy 1.13.1. Performance will be system and array specific.

 %timeit jp(myarr.copy()) # 183 µs %timeit vui(mylist.copy()) # 393 µs %timeit original(mylist.copy()) # 1.85 s import numpy as np from collections import Counter myarr = np.array([4, 1, 2, 6, 1, 0, 9, 8, 0, 9] * 1000) mylist = [4, 1, 2, 6, 1, 0, 9, 8, 0, 9] * 1000 def jp(myarr): return np.delete(myarr, np.where(myarr == 0)[0][1:]) def vui(mylist): return [0] + list(filter(None, mylist)) def original(mylist): for i in mylist: if mylist.count(0) > 1: mylist.remove(0) return mylist 
+1
source

It looks like it's better for you to use the collections.Counter data structure (which is in the standard library):

 import collections counts = collections.Counter(mylist) counts[0] = 1 mylist = list(counts.elements()) 
+1
source

Cutting must be done

 a[start:end] # items start through end-1 a[start:] # items start through the rest of the list a[:end] # items from the beginning through end-1 a[:] # a copy of the whole list 

Input:

 mylist = [4,1, 2, 6, 1, 0, 9, 8, 0, 9,0,0,9,2,2,] pos=mylist.index(0) nl=mylist[:pos+1]+[i for i in mylist[pos+1:] if i!=0] print(nl) 

Conclusion: [4, 1, 2, 6, 1, 0, 9, 8, 9, 9, 2, 2]

+1
source

You can use this:

 desired_value = 0 mylist = [i for i in mylist if i!=desired_value] + [desired_value] 

Now you can change the desired value, you can also do it as a list like this

 desired_value = [0, 6] mylist = [i for i in mylist if i not in desired_value] + desired_value 
+1
source

Perhaps you can use filter .

 [0] + list(filter(lambda x: x != 0, mylist)) 
0
source

Here is a generator-based approach with approximately O (n) complexity, which also preserves the order of the original list:

 In [62]: def remove_dup(lst, item): ...: temp = [item] ...: for i in lst: ...: if i != item: ...: yield i ...: elif i == item and temp: ...: yield temp.pop() ...: In [63]: list(remove_dup(mylist, 0)) Out[63]: [4, 1, 2, 6, 1, 0, 9, 8, 9] 

Also, if you are dealing with larger lists, you can use the following vectorized and optimized approach using Numpy:

 In [80]: arr = np.array([4, 1, 2, 6, 1, 0, 9, 8, 0, 9]) In [81]: mask = arr == 0 In [82]: first_ind = np.where(mask)[0][0] In [83]: mask[first_ind] = False In [84]: arr[~mask] Out[84]: array([4, 1, 2, 6, 1, 0, 9, 8, 9]) 
0
source

here is online:

 [x for i,x in enumerate(mylist) if mylist.index(x)==i or x!=0] 

Result

 [4, 1, 2, 6, 1, 0, 9, 8, 9] 
0
source

You can use enumerate :

 def remove(l, d): return [a for i, a in enumerate(l) if a != d or a not in l[:i]] print(remove([4, 1, 2, 6, 1, 0, 9, 8, 0, 9], 0)) 

Output:

 [4, 1, 2, 6, 1, 0, 9, 8, 9] 
0
source

You can use itertools.count counter which will return 0, 1, ... every time it repeats:

 from itertools import count mylist = [4, 1, 2, 6, 1, 0, 9, 8, 0, 9] counter = count() # next(counter) will be called each time i == 0 # it will return 0 the first time, so only the first time # will 'not next(counter)' be True out = [i for i in mylist if i != 0 or not next(counter)] print(out) # [4, 1, 2, 6, 1, 0, 9, 8, 9] 

The order is preserved and can be easily changed to deduplicate an arbitrary number of values:

 from itertools import count mylist = [4, 1, 2, 6, 1, 0, 9, 8, 0, 9] items_to_dedup = {1, 0} counter = {item: count() for item in items_to_dedup} out = [i for i in mylist if i not in items_to_dedup or not next(counter[i])] print(out) # [4, 1, 2, 6, 0, 9, 8, 9] 
0
source

Source: https://habr.com/ru/post/1276293/


All Articles