Removing dicts with a duplicate value from the dicts list. python

I have a list of dicts as follows:

[{'ppm_error': -5.441115144810845e-07, 'key': 'Y7', 'obs_ion': 1054.5045550349998},
{'ppm_error': 2.3119997582222951e-07, 'key': 'Y9', 'obs_ion': 1047.547178035},
{'ppm_error': 2.3119997582222951e-07, 'key': 'Y9', 'obs_ion': 1381.24928035},
{'ppm_error': -2.5532659838679713e-06, 'key': 'Y4', 'obs_ion': 741.339467035},
{'ppm_error': 1.3036219678359603e-05, 'key': 'Y10', 'obs_ion': 1349.712302035},
{'ppm_error': 3.4259216556970878e-06, 'key': 'Y6', 'obs_ion': 941.424286035},
{'ppm_error': 1.1292770047090912e-06, 'key': 'Y2', 'obs_ion': 261.156025035},
{'ppm_error': 1.1292770047090912e-06, 'key': 'Y2', 'obs_ion': 389.156424565},
{'ppm_error': 9.326980606898406e-06, 'key': 'Y5', 'obs_ion': 667.3107950350001}
]

I want to remove dicts with duplicate keys, so that only dicts with a unique "key" remain. It doesn’t matter which chip is on the final list. Therefore, the final list should look like this:

[{'ppm_error': -5.441115144810845e-07, 'key': 'Y7', 'obs_ion': 1054.5045550349998},
{'ppm_error': 2.3119997582222951e-07, 'key': 'Y9', 'obs_ion': 1381.24928035},
{'ppm_error': -2.5532659838679713e-06, 'key': 'Y4', 'obs_ion': 741.339467035},
{'ppm_error': 1.3036219678359603e-05, 'key': 'Y10', 'obs_ion': 1349.712302035},
{'ppm_error': 3.4259216556970878e-06, 'key': 'Y6', 'obs_ion': 941.424286035},
{'ppm_error': 1.1292770047090912e-06, 'key': 'Y2', 'obs_ion': 261.156025035},
{'ppm_error': 9.326980606898406e-06, 'key': 'Y5', 'obs_ion': 667.3107950350001}
]

Is it possible to use the itertools.groupby function for this, or is there another way to approach this problem? Any suggestions?

+4
source share
4 answers

If order matters, you can use collections.OrderedDictto collect all elements, such as

from collections import OrderedDict
print OrderedDict((d["key"], d) for d in my_list).values()

, ,

print {d["key"]:d for d in my_list}.values()
+6

, . memoization:

def get_key_watcher():
    keys_seen = set()
    def key_not_seen(d):
        key = d['key']
        if key in keys_seen:
            return False  # key is not new
        else:
            keys_seen.add(key)
            return True  # key seen for the first time
    return key_not_seen

:

>>> filtered_dicts = filter(get_key_watcher(), dicts)
>>> filtered_dicts
[{'ppm_error': -5.441115144810845e-07, 'obs_ion': 1054.5045550349998, 'key': 'Y7'},
 {'ppm_error': 2.3119997582222951e-07, 'obs_ion': 1047.547178035, 'key': 'Y9'},
 {'ppm_error': -2.5532659838679713e-06, 'obs_ion': 741.339467035, 'key': 'Y4'},
 {'ppm_error': 1.3036219678359603e-05, 'obs_ion': 1349.712302035, 'key': 'Y10'},
 {'ppm_error': 3.4259216556970878e-06, 'obs_ion': 941.424286035, 'key': 'Y6'},
 {'ppm_error': 1.1292770047090912e-06, 'obs_ion': 261.156025035, 'key': 'Y2'},
 {'ppm_error': 9.326980606898406e-06, 'obs_ion': 667.3107950350001, 'key': 'Y5'}]

, . .

+2

I would do it like this:

list = [...] # your list

finallist = dict(map(lambda x: (x['key'],x), list)).values()

Basically this is the same solution that @thefourtheye gives in his answer ...

0
source

convert it to numpy array

a = numpy.array([(d["ppm_error"],d["key"],d["obs_ion"]) for d in my_dicts])
mask =numpy.unique(a[:,1],True)[1]
uniques = a[mask]

then go back to dict

unique_entries = map(dict,[zip(labels,row) for row in uniques])
0
source

Source: https://habr.com/ru/post/1535132/


All Articles