I build large lists of high-level objects when parsing a tree. However, after this step I need to remove duplicates from the list, and I found this new step very slow in Python 2 (this was acceptable, but still a bit slower in Python 3). However, I know that on separate objects there is a separate identifier. For this reason, I was able to get much faster code by following these steps:
- add all objects to the list during parsing;
- sort the list using the option
key=id; - iterate over the sorted list and delete the item if the previous one has the same identifier.
Thus, I have working code that now runs smoothly, but I wonder if I can accomplish this task more directly in Python.
Example. . Let two identical objects be built with the same value, but with a different identifier (for example, I’ll take fractions.Fractionit to rely on the standard library):
from fractions import Fraction
a = Fraction(1,3)
b = Fraction(1,3)
Now, if I try to accomplish what I want to do using pythonical list(set(...)), I get the wrong result because it {a,b}saves only one of two values (which are identical but have a different identifier).
Now my question is: what is the most pythonic, reliable, short and fast way to remove duplicates by id, and not duplicates by value? Ordering a list does not matter if it needs to be changed.
source
share