Is there an implementation of this string matching method in python?

Question

Is there an implementation of this string matching method in python?

I'm trying to figure out which records in my datastore are close to duplicates using approximate row matching.

Is there any implementation of the following approach in python, or do I need to try casting?

Thank:)

...
The brute force approach will be to compute the editing distance to P for all substring T, and then select the substring with the minimum distance. However, this algorithm will have a run time of O (n3 m)
The best solution [3] [4], using dynamic programming, uses an alternative formulation of the problem: for each position j in the text T and each position I in the pattern P, calculate the minimum editing distance between the first characters of the pattern, Pi and any substring Tj ', j from T that ends at position j.

What is the most efficient way to apply this to many lines?

+3

python string fuzzy-search fuzzy-comparison

significance Mar 04 '11 at 10:04

source share

4 answers

John machin · Answer 1 · 2011-03-04T10:27:52+0000

Yes.

google("python levenshtein")

mgautierfr · Answer 2 · 2011-03-04T10:35:55+0000

difflib.get_close_matches should do the job.

lafras · Answer 3 · 2011-03-04T10:23:18+0000

difflib ,

from difflib import context_diff

a = 'acaacbaaca'
b = 'accabcaacc'

print ''.join(context_diff(a,b))

sk8asd123 · Answer 4 · 2013-08-02T23:12:53+0000

fuzzywuzzy standard ratio(). fuzzywuzzy difflib http://seatgeek.com/blog/dev/fuzzywuzzy-fuzzy-string-matching-in-python

fuzzywuzzy: https://github.com/seatgeek/fuzzywuzzy

fuzz.ratio("this is a test", "this is a test!")
    96

Is there an implementation of this string matching method in python?

More articles: