Multidimensional search by combining geospatial indices

I am creating an application that stores millions of floating point vectors, with each vector having ~ 100 dimensions. With the query vector, I need to search these vectors for k nearest (Euclidean) matches. Run time should be faster than scanning all millions of vectors. By "vector" I mean in terms of linear algebra a list of 100 floating point numbers, i.e. [0.3, -15.7, 0.004, 457.1, ...]

I know databases like MySQL and MongoDB that provide spatial indexes that work for 2 dimensions. Is there a way to adapt this to many other sizes, possibly with composite indexes? Or are there other other data stores for larger support indexes?

+4
source share
2 answers

If you are looking for exact matches, 100 dimensions is a lot. If you are willing to settle for rough matches, there is a class of locally-sensitive-hashing schemes. You can create a hash or a series of hash values ​​for your datasets and use a regular database or 2-dimensional spatial database to search for matches based on the hash value. One link is http://people.csail.mit.edu/indyk/p117-andoni.pdf .

+3
source

I can relate to your pain. There is no R-Tree type in MongoDB, I'm not sure if there is any SQL DB. I found the following link useful:

http://www.slideshare.net/nknize/mongo-sv-knizefinal

0
source

Source: https://habr.com/ru/post/1480164/


All Articles