Effective comparison of 1 million vectors containing (float, integer) tuples

Question

Effective comparison of 1 million vectors containing (float, integer) tuples

I am working on a chemistry / biology project. We are creating a web application for quickly matching user experimental data with predicted data in a reference database. The link database will contain up to a million entries. The data for one record is a list (vector) of tuples containing a floating point value between 0.0 and 20.0 and an integer value between 1 and 18. For example (7.2394, 2), (7.4011, 1), (9.9367, 3), .. . etc. The user enters a similar list of tuples, and the web application should then return the 50 best matching database entries.

One thing is important: the search algorithm must allow discrepancies between the query data and the reference data, since both may contain small errors in the float values (NOT in integer values). (The query data may contain errors because it is derived from a real experiment and reference data, as it is the result of a prediction.)

Edit - Moved text for reply -

How can we get an effective rating of 1 query per 1 million records?

+3

performance comparison math algorithm database

Simmer Feb 22 '10 at 12:45

source share

5 answers

Andrew McGregor · Answer 1 · 2010-02-22T12:59:49+0000

1 ; , .

, , - , . , , ; , , . .

, ... , ( , , )?

, , - , , 1 . , Python . , . Python, .

from cmath import *
import random
r = [(random.uniform(0,20), random.randint(1,18)) for i in range(1000000)]
# this is a decorate-sort-undecorate pattern
# look for matches to (7,9)
# obviously, you can use whatever distance expression you want
zz=[(abs((7-x)+(9-y)),x,y) for x,y in r]
zz.sort()
# return the 50 best matches
[(x,y) for a,x,y in zz[:50]]

Karussell · Answer 2 · 2010-02-22T13:07:51+0000

:-) , . :

: " SEQUEST - - , ".

Stefano Borini · Answer 3 · 2010-02-22T12:47:43+0000

? , , . , . , . , , .

log (n)

p.marino · Answer 4 · 2010-02-22T12:55:39+0000

"" x-y , , / ( ).

.

Simmer · Answer 5 · 2010-02-22T13:44:44+0000

, , , - float. . 0,1, 0,2, 0,3 0,4. , binning 50 200 , 0 18, 0 , . . . , . , , .

( ) , , , float. 1. , . .

- . . , (PCA),

Effective comparison of 1 million vectors containing (float, integer) tuples

More articles: