How to use kd trees to determine string similarity?

I am trying to use k-nearest neighbors for the string similarity problem, that is, given the string and the knowledge base, I want to output k lines that look like my given string. Are there any tutorials that explain how to use kd trees to efficiently search for k-nearest neighbors for strings? The line length must not exceed 20 characters.

+6
source share
1 answer

Probably one of the hottest blog posts I read a year ago: Levenstein Automata . Take a look at this article. It provides not only a description of the algorithm, but also code for the subsequent. Technically, this is not a kd-tree, but it is completely connected with string correction algorithms and word correction algorithms that can be found / used in the real world.

He also has another blog post about BK-trees , which is much better suited for fuzzy matching for strings and string search queries where there are incorrect names. Here is another resource containing the source code for BK-tree (this I cannot check the accuracy or the correct implementation.)

+7
source

Source: https://habr.com/ru/post/886106/


All Articles