A hash that finds close results

I have a table containing about 10,000 entries, each entry has almost 100 boolean values. The user has a checkmark with the bucklers and hopes to get a result that matches their request. If this record does not exist, I want to show them perhaps 5 records that are close (have only 1 or two values). There is a good hash system or data structure that can help me find these results.

+3
source share
3 answers

Raster indexes. Google for paper, if you want a complete background, it's not easy, but worth a read. Basically create bitmpas for your booleans as follows:

010110101010
110100010100
000101001100

XOR , , . ( , () 100 ), , .

: XOR. ( ​​)

000101001100 source
000101001010 target
000000000110 result of XOR

 int n = 0; if (v) do { n++; } while (v &= (v-1)); return(n);

1 , 2 m-2 , m - .

+3

, , - : 5 (, ).

, " ", .

, , kd-tree vp-tree. , , - , kd-, .

+2

It depends on the answer from Kdansky.

Create a dynamic array of entries.
Create a cache.

for each lookup,
   check the cache.
   return the cache entry if the value exists.
   otherwise for each value in the dynamic array,
       if hamming distance is less than threshold add to the result list
   cache the value against the result
   return the result

to find the distance for interference: xor and find the weight for interference http://en.wikipedia.org/wiki/Hamming_weight

+1
source

Source: https://habr.com/ru/post/1774272/


All Articles