How to use kd trees to determine string similarity?

Question

How to use kd trees to determine string similarity?

I am trying to use k-nearest neighbors for the string similarity problem, that is, given the string and the knowledge base, I want to output k lines that look like my given string. Are there any tutorials that explain how to use kd trees to efficiently search for k-nearest neighbors for strings? The line length must not exceed 20 characters.

+6

python algorithm kdtree

Legend Apr 17 '11 at 10:23

source share

1 answer

wheaties · Accepted Answer · 2011-04-17T22:26:30+0000

Probably one of the hottest blog posts I read a year ago: Levenstein Automata . Take a look at this article. It provides not only a description of the algorithm, but also code for the subsequent. Technically, this is not a kd-tree, but it is completely connected with string correction algorithms and word correction algorithms that can be found / used in the real world.

He also has another blog post about BK-trees , which is much better suited for fuzzy matching for strings and string search queries where there are incorrect names. Here is another resource containing the source code for BK-tree (this I cannot check the accuracy or the correct implementation.)

How to use kd trees to determine string similarity?

More articles: