I have a relatively simple thing I want to do:
- Given the request number Q, the request distance d and the set of numbers S, it is determined whether or not S contains any numbers with a Hamming distance less than or equal to d.
The simplest solution is to simply compile an S list and iterate over it by calculating the distances. If a distance less than or equal to d is calculated, issue a return TRUE.
But given that all I want to do is test existence, perhaps something is faster than a linear workaround.
One thing I tried is the M-tree. Referring to some other questions about stackoverflow, a wikipedia article ( https://en.wikipedia.org/wiki/M-tree ) and two pre-existing implementations, I spent several hours yesterday doing my own solution, One of the nice things about this problem is that it’s actually cheaper to compute a popcount over the XOR of two numbers (using the SSE instruction) than to store numbers that will avoid calculating the metric, so there are several aspects of the solution that can be simplified and optimized for speed.
The results were very disappointing. It turns out that the metric radius I'm dealing with is small compared to the minimum Hamming distance. For example, in a space with 12-bit numbers, the maximum Hamming distance is 12. If the minimum I am looking for is 4, this does not leave much room for a good disjoint partition. In fact, I tried just that, creating using brute force a set of 12-bit numbers with a minimum Hamming distance of 4, and then (using brute force) I found the optimal split of binary trees so that the search algorithm could visit the minimum number of nodes. If I want to countthe number of given elements inside the d query, I can’t reduce the number of node visits below about 30% of the total and stop when I find the first one it visits about 4%. This means that I more or less made a linear time solution, where the overhead of a complex tree search algorithm is about the same as the savings, because there is no need to check as many elements of the set as possible.
But what I want to do is very limited. I don’t even want to count the number of elements in the set with the request distance <= d, and even less to list them. I just want to check existence. It makes me think of things like flowering filters and hashes.
, . , , , - , , , .
- , ?
:
. d N, hamming d N- ? , d/2 d/2-1 . - , LDPC, , . , OLSC, , . , d = 4 (SECDED) . BCH DECTED, , . , N d , . , .
() , () , () - , , .:)