wonders how best to approach this particular problem, and if there are any libraries (preferably python, but if necessary I can be flexible).
I have a file with a line in each line. I would like to find the longest common patterns and their locations on each line. I know that I can use SequenceMatcher to compare rows one and two, one and three, and so on, and then compare the results, but if there is something that already does this?
Ideally, these matches will appear anywhere on each line, but for starters, I can be fine with them, existing with the same offset in each line and from there. Something like a compression library that has a good API for accessing its row table might be ideal, but I have not yet found anything that matches this description.
For example, with these lines:
\x00\x00\x8c\x9e\x28\x28\x62\xf2\x97\x47\x81\x40\x3e\x4b\xa6\x0e\xfe\x8b
\x00\x00\xa8\x23\x2d\x28\x28\x0e\xb3\x47\x81\x40\x3e\x9c\xfa\x0b\x78\xed
\x00\x00\xb5\x30\xed\xe9\xac\x28\x28\x4b\x81\x40\x3e\xe7\xb2\x78\x7d\x3e
I would like to see that 0-1 and 10-12 coincide on all lines in the same position, and line1 [4,5] corresponds to line2 [5,6] corresponds to line 3 [7,8].
Thank,
source
share