Removing duplicates in Python list by id

Question

Removing duplicates in Python list by id

I build large lists of high-level objects when parsing a tree. However, after this step I need to remove duplicates from the list, and I found this new step very slow in Python 2 (this was acceptable, but still a bit slower in Python 3). However, I know that on separate objects there is a separate identifier. For this reason, I was able to get much faster code by following these steps:

add all objects to the list during parsing;
sort the list using the option key=id;
iterate over the sorted list and delete the item if the previous one has the same identifier.

Thus, I have working code that now runs smoothly, but I wonder if I can accomplish this task more directly in Python.

Example. . Let two identical objects be built with the same value, but with a different identifier (for example, I’ll take fractions.Fractionit to rely on the standard library):

from fractions import Fraction
a = Fraction(1,3)
b = Fraction(1,3)

Now, if I try to accomplish what I want to do using pythonical list(set(...)), I get the wrong result because it {a,b}saves only one of two values (which are identical but have a different identifier).

Now my question is: what is the most pythonic, reliable, short and fast way to remove duplicates by id, and not duplicates by value? Ordering a list does not matter if it needs to be changed.

+4

python list duplicates

Thomas baruchel Nov 19 '16 at 8:41

source share

2 answers

, id , python , :

a = "foo"
b = "foo"
print(a is b)

True

, ( -), id.

:

from fractions import Fraction
a = Fraction(1,3)
b = Fraction(1,3)

d = dict()

d[id(a)] = a
d[id(b)] = b

print(d.values())

:

dict_values([Fraction(1, 3), Fraction(1, 3)])

+2

Jean-François Fabre 19 . '16 8:53

Kasramvd · Accepted Answer · 2016-11-19T08:48:05+0000

__eq__, id, . , , __hash__.

class My_obj:
    def __init__(self, val):
        self.val = val

    def __hash__(self):
        return hash(self.val)

    def __eq__(self, arg):
        return id(self) == id(arg)

    def __repr__(self):
        return str(self.val)

:

a = My_obj(5)
b = My_obj(5)

print({a, b})
{5, 5}

Removing duplicates in Python list by id

More articles: