How to remove duplicates in a list of objects without __hash__

I have a list of user objects from which I want to remove duplicates. You usually do this by defining both __eq__ and __hash__ for your objects, and then accept the set list of objects. I defined __eq__ , but I cannot find a good way to implement __hash__ so that it returns the same value for equal objects.

In particular, I have a class that is derived from the Tree class from the ete3 toolkit . I defined two objects equal if their Robinson-Foulds distance is zero.

 from ete3 import Tree class MyTree(Tree): def __init__(self, *args, **kwargs): super(MyTree, self).__init__(*args, **kwargs) def __eq__(self, other): rf = self.robinson_foulds(other, unrooted_trees=True) return not bool(rf[0]) newicks = ['((D, C), (A, B),(E));', '((D, B), (A, C),(E));', '((D, A), (B, C),(E));', '((C, D), (A, B),(E));', '((C, B), (A, D),(E));', '((C, A), (B, D),(E));', '((B, D), (A, C),(E));', '((B, C), (A, D),(E));', '((B, A), (C, D),(E));', '((A, D), (B, C),(E));', '((A, C), (B, D),(E));', '((A, B), (C, D),(E));'] trees = [MyTree(newick) for newick in newicks] print len(trees) # 12 print len(set(trees)) # also 12, not what I want! 

Both print len(trees) and print len(set(trees)) return 12, but this is not what I want, because several objects are equal to each other:

 from itertools import product for t1, t2 in product(newicks, repeat=2): if t1 != t2: mt1 = MyTree(t1) mt2 = MyTree(t2) if mt1 == mt2: print t1, '==', t2 

which returns:

 ((D, C), (A, B),(E)); == ((C, D), (A, B),(E)); ((D, C), (A, B),(E)); == ((B, A), (C, D),(E)); ((D, C), (A, B),(E)); == ((A, B), (C, D),(E)); ((D, B), (A, C),(E)); == ((C, A), (B, D),(E)); ((D, B), (A, C),(E)); == ((B, D), (A, C),(E)); ((D, B), (A, C),(E)); == ((A, C), (B, D),(E)); ((D, A), (B, C),(E)); == ((C, B), (A, D),(E)); ((D, A), (B, C),(E)); == ((B, C), (A, D),(E)); ((D, A), (B, C),(E)); == ((A, D), (B, C),(E)); ((C, D), (A, B),(E)); == ((D, C), (A, B),(E)); ((C, D), (A, B),(E)); == ((B, A), (C, D),(E)); ((C, D), (A, B),(E)); == ((A, B), (C, D),(E)); ((C, B), (A, D),(E)); == ((D, A), (B, C),(E)); ((C, B), (A, D),(E)); == ((B, C), (A, D),(E)); ((C, B), (A, D),(E)); == ((A, D), (B, C),(E)); ((C, A), (B, D),(E)); == ((D, B), (A, C),(E)); ((C, A), (B, D),(E)); == ((B, D), (A, C),(E)); ((C, A), (B, D),(E)); == ((A, C), (B, D),(E)); ((B, D), (A, C),(E)); == ((D, B), (A, C),(E)); ((B, D), (A, C),(E)); == ((C, A), (B, D),(E)); ((B, D), (A, C),(E)); == ((A, C), (B, D),(E)); ((B, C), (A, D),(E)); == ((D, A), (B, C),(E)); ((B, C), (A, D),(E)); == ((C, B), (A, D),(E)); ((B, C), (A, D),(E)); == ((A, D), (B, C),(E)); ((B, A), (C, D),(E)); == ((D, C), (A, B),(E)); ((B, A), (C, D),(E)); == ((C, D), (A, B),(E)); ((B, A), (C, D),(E)); == ((A, B), (C, D),(E)); ((A, D), (B, C),(E)); == ((D, A), (B, C),(E)); ((A, D), (B, C),(E)); == ((C, B), (A, D),(E)); ((A, D), (B, C),(E)); == ((B, C), (A, D),(E)); ((A, C), (B, D),(E)); == ((D, B), (A, C),(E)); ((A, C), (B, D),(E)); == ((C, A), (B, D),(E)); ((A, C), (B, D),(E)); == ((B, D), (A, C),(E)); ((A, B), (C, D),(E)); == ((D, C), (A, B),(E)); ((A, B), (C, D),(E)); == ((C, D), (A, B),(E)); ((A, B), (C, D),(E)); == ((B, A), (C, D),(E)); 

So my question is:

  • What would be a good __hash__ implementation for my case for set(trees) work?
  • Or how to remove objects that are equal from my list without defining __hash__ ?
+5
source share

Source: https://habr.com/ru/post/1272496/


All Articles