How to remove unique and then duplicate dictionaries in a list?

Question

How to remove unique and then duplicate dictionaries in a list?

Given the following list, which contains some duplicate and some unique dictionaries, what is the best method for deleting unique dictionaries, then reduce duplicate dictionaries to single copies? I have to say that I just started doing Python, but making this project a lot easier. I'm just a little stumped on this issue.

So my list looks like this:

[{  'file': u'/file.txt',
    'line': u'line 666',
    'rule': u'A DUPLICATE RULE'}

{   'file': u'/file.txt',
    'line': u'line 666',
    'rule': u'A DUPLICATE RULE'}

{   'file': u'/uniquefile.txt',
    'line': u'line 999',
    'rule': u'A UNIQUE RULE'}]

What I'm going to, at the end, the list should look like this:

[{  'file': u'/file.txt',
    'line': u'line 666',
    'rule': u'A DUPLICATE RULE'}]

+3

python dictionary list

Geuis Nov 13 '09 at 3:19

source share

7 answers

dicts, .

, :

class rule(object):
    def __init__(self, file, line, rule):
        self.file = file
        self.line = line
        self.rule = rule

    #Not a "magic" method, just a helper for all the methods below :)
    def _tuple_(self):
        return (self.file, self.line, self.rule)

    def __eq__(self, other):
        return cmp(self, other) == 0

    def __cmp__(self, other):
        return cmp(self._tuple_(), rule._tuple_(other))

    def __hash__(self):
        return hash(self._tuple_())

    def __repr__(self):
        return repr(self._tuple_())

. ruledict_list .

rules = [rule(**r) for r in ruledict_list]
rules.sort()

(), . , , . , .

pos = 0
while(pos < len(rules)):
    while pos < len(rules)-1 and rules[pos] == rules[pos+1]:
        print "Skipping rule %s" % rules[pos]
        pos+=1
    rules.pop(pos)
rule_set = set(rules)

+2

gnud 13 . '09 16:27

, . (Python , , .) , 1.

, , - , , . ( Python .)

+1

EMP 13 . '09 3:28

, :

from operator import itemgetter
from collections import defaultdict

counter = defaultdict(int)
for d in inputdata:
    counter[frozenset(d.iteritems())] += 1

result = [dict(item) for item, count in counter.iteritems() if count > 1]
print result

, , .

+1

nosklo 13 . '09 3:37

>>> import itertools
>>> list(a[0] for a in itertools.groupby(sorted(data)) if len(list(a[1])) > 1)
[{'file': u'/file.txt', 'line': u'line 666', 'rule': u'A DUPLICATE RULE'}]

, , len ( (a [1])).

: .

+1

Steven Huwig 13 . '09 3:38

. , sorted() , groupby() .

, : ", , len (list (a [1]))". . , , .next() . , , ; StopIteration .next(), . ( , itertools.groupby, , .)

, tuple, a[0] a[1], , , , , .

, list(), , .

data = [
    {
        'file': u'/file.txt',
        'line': u'line 666',
        'rule': u'A DUPLICATE RULE'
    },

    {   'file': u'/uniquefile.txt',
        'line': u'line 999',
        'rule': u'A UNIQUE RULE'
    },

    {   'file': u'/file.txt',
        'line': u'line 666',
        'rule': u'A DUPLICATE RULE'
    },

]

from itertools import groupby

def notunique(itr):
    try:
        itr.next()
        itr.next()
        return True
    except StopIteration:
        return False

def unique_list(lst):
    return [key for key, itr in groupby(sorted(lst)) if notunique(itr)]

print(unique_list(data))

+1

steveha 13 . '09 6:24

- dict. , __cmp__, __eq__ __hash__. "set" .

Here is one possible implementation, although I do not make promises about the quality of the hash procedure that I have provided:

class Thing(object):
    def __init__(self, file, line, rule):
        self.file = file
        self.line = line
        self.rule = rule

    def __cmp__(self, other):
        result = cmp(self.file, other.file)
        if result == 0:
            result = cmp(self.line, other.line)
        if result == 0:
            result = cmp(self.rule, other.rule)
        return result

    def __eq__(self, other):
        return cmp(self, other) == 0

    def __hash__(self):
        return hash(self.file) * hash(self.line) * hash(self.rule)

    def __str__(self):
        return ', '.join([self.file, self.line, self.rule])

things = [ Thing(u'/file.txt', u'line 666', u'A DUPLICATE RULE'),
  Thing(u'/file.txt', u'line 666', u'A DUPLICATE RULE'),
  Thing(u'/uniquefile.txt', u'line 999', u'A UNIQUE RULE')]

duplicate_things = set()
unique_things = set()
for t in things:
    if t in unique_things:
        duplicate_things.add(t)
    else:
        unique_things.add(t)

If you need to return to the list, just create it from the result set:

unique_things = list(unique_things)
duplicate_things = list(duplicate_things)

This is a bit more code to create your own class like this, but may give you other options in the future if your program gets complicated.

Edit

Ok, my hands are faster than my eyes today, but I think this editing solves the problem pointed out by @nosklo

0

Joe holloway Nov 13 '09 at 4:01

source share

nosklo · Accepted Answer · 2009-11-13T03:31:10+0000

One idea is to sort the data. Assume inputdatayour list above:

from itertools import groupby
from operator import itemgetter

inputdata.sort(key=itemgetter(*inputdata[0])) # ensures order
print [k for k, g in groupby(inputdata) if len(list(g)) > 1]

prints:

[{'line': u'line 666', 'file': u'/file.txt', 'rule': u'A DUPLICATE RULE'}]

How to remove unique and then duplicate dictionaries in a list?

More articles: