I am trying to create a nested or recursive effect using SequenceMatcher.
The ultimate goal is to compare two sequences, both of which can contain instances of different types.
For example, sequences may be:
l1 = [1, "Foo", "Bar", 3] l2 = [1, "Fo", "Bak", 2]
Typically, the SequenceMatcher will only identify [1] as a common subsequence for l1 and l2.
I would like SequnceMatcher to be applied twice to row instances , so "Foo" and "Fo" will be considered equal, as well as "Bar" and "Bak" , and the longest common subsequence will have a length of 3 [1, Foo/Fo, Bar/Bak] . That is, I would like SequenceMatcher to be more forgiving when comparing string members .
I tried to write a wrapper for the str built-in class:
from difflib import SequenceMatcher class myString: def __init__(self, string): self.string = string def __hash__(self): return hash(self.string) def __eq__(self, other): return SequenceMatcher(a=self.string, b=self.string).ratio() > 0.5
Edit: a possibly more elegant way:
class myString(str): def __eq__(self, other): return SequenceMatcher(a=self, b=other).ratio() > 0.5
Having done this, you can do the following:
>>> Foo = myString("Foo") >>> Fo = myString("Fo") >>> Bar = myString("Bar") >>> Bak = myString("Bak") >>> l1 = [1, Foo, Bar, 3] >>> l2 = [1, Fo, Bak, 2] >>> SequenceMatcher(a=l1, b=l2).ratio() 0.75
So, obviously this works, but I have a bad feeling about overriding the hash function. When is a hash used? Where can he come back and bite me?
SequenceMatcher documentation says the following:
This is a flexible class for comparing pairs of sequences of any type while sequence elements are hashed .
And by definition, hashable elements must fulfill the following requirement:
Hashable objects that compare peers must have the same hash value .
Also, do I need to override cmp ?
I would like to hear about other solutions that come to mind.
Thanks.