Nearest Vector Search Algorithms

I have a set of vectors. For the vector in this set, I like to find a subset that is close to this vector. What algorithm can do this.

+4
source share
3 answers

use cosine similarity ( http://en.wikipedia.org/wiki/Cosine_similarity ) among the vectors, and then sort them.

+3
source

This class of algorithms is called Nearest Neighbor or K Nearest Neighbor.

similarity to cosine , as excepeiont says, will work if vector direction is important. If the vector represents a position in space, then any metric to represent the distance in space will work.

For example, Euclidean distance : take the square root of the sum of the squared differences in each dimension. This will give you the distance for each vector, and then sort your set of vectors increasing at that distance.

This process will be O (N) in time. If this is too slow for you, you can look at some common K Nearest Neighbor algorithms.

+3
source

If your problem is with a lot of data:

I published the corresponding algorithm on ddj.com, which finds the closest line to a given point:

Faster search for the nearest line

You will have to change this algorithm, for example, by converting a given vector to a number of points. This will significantly reduce the number of possible matches. Exact match must be checked for every possible match on

  • Find the cutting point of both vectors or
  • Get the distance from the beginning and end of the vector to the possible match, as described in the article
+1
source

Source: https://habr.com/ru/post/1306893/


All Articles