Mark duplicates in a list

Let's say I have a list of names in python, for example:

names = ['Alice','Bob','Carl','Dave','Bob','Earl','Carl','Frank','Carl']

Now I want to get rid of the fact that there are duplicate names in this list, but I do not want to delete them. Instead, for each name that appears more than once in this list, I want to add a suffix to this name, where the suffix is ​​the nth time the name appeared, while preserving the order of the list. Since Carl is on the list of 3, I want to be able to refer to them as Carl_1, Carl_2 and Carl_3, respectively. Thus, in this case, the desired output is as follows:

names = ['Alice','Bob_1','Carl_1','Dave','Bob_2','Earl','Carl_2','Frank','Carl_3']

I can do this by going through the list and changing each name if it needs to be changed, for example, using the following code.

 def mark_duplicates(name_list): output = [] duplicates = {} for name in name_list: if name_list.count(name) = 1: output.append(name) else: if name in duplicates: duplicates['name'] += 1 else: duplicates['name'] = 1 output.append(name + "_" + str(duplicates['name'])) return output 

However, this is a lot of work, and many lines of code for something I suspect should not be very difficult. Is there an easier way to accomplish what I want to do? For example, using something like list comprehension or a package like itertools or something else?

+5
source share
5 answers

collections.Counter can slightly reduce accounting:

 In [106]: out = [] In [107]: fullcount = Counter(names) In [108]: nc = Counter() In [109]: for n in names: ...: nc[n] += 1 ...: out.append(n if fullcount[n] == 1 else '{}_{}'.format(n, nc[n])) ...: In [110]: out Out[110]: ['Alice', 'Bob_1', 'Carl_1', 'Dave', 'Bob_2', 'Earl', 'Carl_2', 'Frank', 'Carl_3'] 
+8
source

The following code should do what you are looking for and use understanding:

 def get_duplicates(names): counts = { k: 0 for k in names } output = [] for name in names: if count[name] == 0: output.append(name) counts[name] += 1 else: output.append("{}_{}".format(name, counts[name])) counts[name] += 1 return output 

Update: I fixed the code in my answer to correctly return what the OP is looking for. Not the best way, but it does not require the use of another library and uses 1 dict understanding and 1 cycle.

0
source

If you don't care about the original order, you can think of it this way:

  • Count the number of views of each name
  • Create a list in which if the name appears only once, we do not add anything, but if it appears more than once, it adds _1 , _2 ... to the second and subsequent appearances.

This means that you can use collections.Counter to complete the task:

 import collections names = ['Alice', 'Bob', 'Carl', 'Dave', 'Bob', 'Earl', 'Carl', 'Frank', 'Carl'] counter = collections.Counter(names) print("Counter: %s" % counter) result = [] for name, counts in counter.iteritems(): result.append(name) for i in range(1, counts): result.append("%s_%d" % (name, i)) print(result) 

What outputs:

 Counter: Counter({'Carl': 3, 'Bob': 2, 'Earl': 1, 'Frank': 1, 'Alice': 1, 'Dave': 1}) ['Earl', 'Frank', 'Alice', 'Dave', 'Carl', 'Carl_1', 'Carl_2', 'Bob', 'Bob_1'] 

If you want to add the suffix _1 , _2 to all names that contain more than one entry in the list, but leave the names that appear only once, you can do:

 import collections names = ['Alice', 'Bob', 'Carl', 'Dave', 'Bob', 'Earl', 'Carl', 'Frank', 'Carl'] counter = collections.Counter(names) print("Counter: %s" % counter) result = [] for name, counts in counter.iteritems(): if counts == 1: result.append(name) else: for i in range(counts): result.append("%s_%d" % (name, i + 1)) print(result) 

What outputs:

 Counter: Counter({'Carl': 3, 'Bob': 2, 'Earl': 1, 'Frank': 1, 'Alice': 1, 'Dave': 1}) ['Earl', 'Frank', 'Alice', 'Dave', 'Carl_1', 'Carl_2', 'Carl_3', 'Bob_1', 'Bob_2'] 
0
source

If ['Alice', 'Bob', 'Carl', 'Dave', 'Bob_2', 'Earl', 'Carl_2', 'Frank', 'Carl_3'] is a valid exit (first person not having _1 added) I would suggest the following:

 counts = {} def append(name): try: counts[name] += 1 return True except: counts[name] = 1 return False def get_duplicates(): return ['_'.join([name, str(counts[name])]) if append(name) else name for name in names] 

The advantage of this approach is that I go through names once, so I cannot know in advance whether more will appear.


To meet the specification, I can further modify the append:

 def append(name): if names.count(name) != 1: try: counts[name] += 1 except: counts[name] = 1 return True else: return False 

which will give the expected result:

 ['Alice', 'Bob_1', 'Carl_1', 'Dave', 'Bob_2', 'Earl', 'Carl_2', 'Frank', 'Carl_3'] 
0
source

Another solution using enumerate :

 >>> names = ['Alice','Bob','Carl','Dave','Bob','Earl','Carl','Frank','Carl'] >>> processed = [] >>> for n in names: ... if n not in processed: ... indices = [i for i,name in enumerate(names) if name == n] ... if len(indices) > 1: ... suffix = 1 ... for i in indices: ... names[i] = "{}_{}".format(names[i], suffix) ... suffix += 1 ... if n.split('_')[0] not in processed: ... processed.append(n) ... >>> >>> names ['Alice', 'Bob_1', 'Carl_1', 'Dave', 'Bob_2', 'Earl', 'Carl_2', 'Frank', 'Carl_3'] 
0
source

Source: https://habr.com/ru/post/1258769/


All Articles