Removing duplicate JSON objects from a list in python

I have a dict list where a specific value is repeated several times, and I would like to remove duplicate values.

My list:

te = [ { "Name": "Bala", "phone": "None" }, { "Name": "Bala", "phone": "None" }, { "Name": "Bala", "phone": "None" }, { "Name": "Bala", "phone": "None" } ] 

to remove duplicate values:

 def removeduplicate(it): seen = set() for x in it: if x not in seen: yield x seen.add(x) 

When I call this function, I get a generator object .

 <generator object removeduplicate at 0x0170B6E8> 

When I try to TypeError: unhashable type: 'dict' over the generator, I get TypeError: unhashable type: 'dict'

Is there a way to remove duplicate values ​​or iterate over the generator

+5
source share
3 answers

You can easily remove duplicate keys according to the concept of the dictionary, since the dictionary does not allow duplicate keys, as shown below -

 te = [ { "Name": "Bala", "phone": "None" }, { "Name": "Bala", "phone": "None" }, { "Name": "Bala", "phone": "None" }, { "Name": "Bala", "phone": "None" }, { "Name": "Bala1", "phone": "None" } ] unique = { each['Name'] : each for each in te }.values() print unique 

Output -

 [{'phone': 'None', 'Name': 'Bala1'}, {'phone': 'None', 'Name': 'Bala'}] 
+7
source

Because you cannot add dict to set . From this question :

You are trying to use dict as a key to another dict or in set . This does not work because keys must be hashed.

As a rule, only immutable objects (strings, integers, float, frozensets, tuples of immutable) are hashed (although exceptions are possible).

 >>> foo = dict() >>> bar = set() >>> bar.add(foo) Traceback (most recent call last): File "<input>", line 1, in <module> TypeError: unhashable type: 'dict' >>> 

Instead, you are already using if x not in seen , so just use a list:

 >>> te = [ ... { ... "Name": "Bala", ... "phone": "None" ... }, ... { ... "Name": "Bala", ... "phone": "None" ... }, ... { ... "Name": "Bala", ... "phone": "None" ... }, ... { ... "Name": "Bala", ... "phone": "None" ... } ... ] >>> def removeduplicate(it): ... seen = [] ... for x in it: ... if x not in seen: ... yield x ... seen.append(x) >>> removeduplicate(te) <generator object removeduplicate at 0x7f3578c71ca8> >>> list(removeduplicate(te)) [{'phone': 'None', 'Name': 'Bala'}] >>> 
+2
source

You can still use set for double discovery, you just need to convert the dictionary to something hashed, like tuple . Your dictionaries can be converted to tuple(d.items()) tuples tuple(d.items()) , where d is a dictionary. Applying this to your generator function:

 def removeduplicate(it): seen = set() for x in it: t = tuple(x.items()) if t not in seen: yield x seen.add(t) >>> for d in removeduplicate(te): ... print(d) {'phone': 'None', 'Name': 'Bala'} >>> te.append({'Name': 'Bala', 'phone': '1234567890'}) >>> te.append({'Name': 'Someone', 'phone': '1234567890'}) >>> for d in removeduplicate(te): ... print(d) {'phone': 'None', 'Name': 'Bala'} {'phone': '1234567890', 'Name': 'Bala'} {'phone': '1234567890', 'Name': 'Someone'} 

This provides an accelerated search (avg. O (1)) than the "seen" list (O (n)). Whether it is worthwhile additionally calculating the conversion of each dict to a tuple depends on the number of dictionaries you have and the number of duplicates. If there are a lot of duplicates, the β€œseen” list will grow quite large, and checking if the dick has already seen can be an expensive operation. This may justify converting tuples - you will have to check / profile it.

+1
source

Source: https://habr.com/ru/post/1236956/


All Articles