I am working on a chemistry / biology project. We are creating a web application for quickly matching user experimental data with predicted data in a reference database. The link database will contain up to a million entries. The data for one record is a list (vector) of tuples containing a floating point value between 0.0 and 20.0 and an integer value between 1 and 18. For example (7.2394, 2), (7.4011, 1), (9.9367, 3), .. . etc. The user enters a similar list of tuples, and the web application should then return the 50 best matching database entries.
One thing is important: the search algorithm must allow discrepancies between the query data and the reference data, since both may contain small errors in the float values (NOT in integer values). (The query data may contain errors because it is derived from a real experiment and reference data, as it is the result of a prediction.)
Edit - Moved text for reply -
How can we get an effective rating of 1 query per 1 million records?
1 ; , .
, , - , . , , ; , , . .
, ... , ( , , )?
, , - , , 1 . , Python . , . Python, .
from cmath import * import random r = [(random.uniform(0,20), random.randint(1,18)) for i in range(1000000)] # this is a decorate-sort-undecorate pattern # look for matches to (7,9) # obviously, you can use whatever distance expression you want zz=[(abs((7-x)+(9-y)),x,y) for x,y in r] zz.sort() # return the 50 best matches [(x,y) for a,x,y in zz[:50]]
:-) , . :
: " SEQUEST - - , ".
? , , . , . , . , , .
log (n)
"" x-y , , / ( ).
.
, , , - float. . 0,1, 0,2, 0,3 0,4. , binning 50 200 , 0 18, 0 , . . . , . , , .
( ) , , , float. 1. , . .
- . . , (PCA),
Source: https://habr.com/ru/post/1733877/More articles:Best way to support wildcard search in a large dictionary? - dictionaryJava -verbose: gc Эффективные последствия? - javadjango-tinymce: using different options for different instances - pythonJBoss eventually stops responding, but no OOME - jbossHow to call a shell application from an SQL stored procedure? - sqlSimple query throwing SQL syntax error - sqlPackage for quickly determining the similarity between two sequences of bits - performanceWhy do I get different values when reading bytes from NSData depending on the order in which I get the bytes? - objective-cC # Get variable name in string - c #Can I create interfaces or abstract classes in C # that declare methods with a parameter of an unknown type? - inheritanceAll Articles