I am creating an application that stores millions of floating point vectors, with each vector having ~ 100 dimensions. With the query vector, I need to search these vectors for k nearest (Euclidean) matches. Run time should be faster than scanning all millions of vectors. By "vector" I mean in terms of linear algebra a list of 100 floating point numbers, i.e. [0.3, -15.7, 0.004, 457.1, ...]
I know databases like MySQL and MongoDB that provide spatial indexes that work for 2 dimensions. Is there a way to adapt this to many other sizes, possibly with composite indexes? Or are there other other data stores for larger support indexes?
source share