There is an ANN library that works well for large dimensional arrays of data, but it is not a complete "database" and is not a distributed solution.
This is where the SpaceCurve script runs (without me) working on a commercial spatial database, so depending on your needs and budget, they can be useful.
As a tip: you should think deeply about what “closest neighbor” means when you talk about “hundreds of thousands of dimensions”. If you take a million random points in a 20-dimensional cube, the average distance between any two nearest neighbors is already about half the length of the edge of the cube.
It only worsens exponentially as measurements are added. When you talk about hundreds of dimensions, you really need incredibly large numbers of points (for example,> 10 30 ) if they are somewhat evenly distributed; and if they are distributed differently, you're better off with other classification approaches.
source share