How to find k-nearest values โ€‹โ€‹in n-dimensional space?

I read about kd trees, but they are ineffective when the dimension of space is high. I have a database with values โ€‹โ€‹and I want to find values โ€‹โ€‹that are within a certain distance from the query request. For example, a database is a list of 32-bit numbers, and I want to find all numbers that differ from the query value by less than 3 bits.

I heard somewhere about MultiVariate Partition trees, but could not find a good link. I know that min-Hash gives a good approximation, better, but I need an exact answer.

+4
source share
1 answer

The distance from interference is closely related to levenshtein distance and is similar to the algorithms used for spelling correction.

Used branch-and-bound method to search in trie . It takes time that exponentially at a distance, at close range, to a linear-sized dictionary.

If the dictionary contains binary words stored in binary trie, with a strict distance for hamming, here is a simple pseudo-code:

walk(trie, word, i, hit, budget){ if (budget < 0 || i > word.length) return; if (trie==NULL){ if (i==word.length) print hit; return; } hit[i] = 0; walk(trie.subtrie[0], word, i+1, hit, (word[i]==0 ? budget : budget-1)); hit[i] = 1; walk(trie.subtrie[1], word, i+1, hit, (word[i]==1 ? budget : budget-1)); } main(){ for (int budget = 0; ; budget++){ walk(trie, word, 0, hit, budget); /* quit if enough hits have been printed */ } } 

The idea is that you walk around the whole trail, tracking the distance between the current trie node and the original word. You crop your search by having a budget for the distance that you will tolerate. This works because the distance can never decrease as you go deeper into the trie.

Then you do this several times when budgets start from scratch and increase in steps until you print out the strokes you need. Since each walk covers as many less knots as a subsequent walk does not stop you from taking a few walks. If k fixed, you can just start with this as your budget.

+1
source

Source: https://habr.com/ru/post/1303336/


All Articles