Multidimensional search by combining geospatial indices

Question

Multidimensional search by combining geospatial indices

I am creating an application that stores millions of floating point vectors, with each vector having ~ 100 dimensions. With the query vector, I need to search these vectors for k nearest (Euclidean) matches. Run time should be faster than scanning all millions of vectors. By "vector" I mean in terms of linear algebra a list of 100 floating point numbers, i.e. [0.3, -15.7, 0.004, 457.1, ...]

I know databases like MySQL and MongoDB that provide spatial indexes that work for 2 dimensions. Is there a way to adapt this to many other sizes, possibly with composite indexes? Or are there other other data stores for larger support indexes?

+4

algorithm database vector search geospatial

muudscope May 10, '13 at 19:17

source share

2 answers

mcdowella · Answer 1 · 2013-05-10T20:26:01+0000

If you are looking for exact matches, 100 dimensions is a lot. If you are willing to settle for rough matches, there is a class of locally-sensitive-hashing schemes. You can create a hash or a series of hash values for your datasets and use a regular database or 2-dimensional spatial database to search for matches based on the hash value. One link is http://people.csail.mit.edu/indyk/p117-andoni.pdf .

Uroboros · Answer 2 · 2013-12-16T18:27:01+0000

I can relate to your pain. There is no R-Tree type in MongoDB, I'm not sure if there is any SQL DB. I found the following link useful:

http://www.slideshare.net/nknize/mongo-sv-knizefinal

Multidimensional search by combining geospatial indices

More articles: