Building a "sparse" array of queries that minimize memory

let's say I want to build an array to do a search to analyze network protocols (like ethertype). Since such an identifier is 2 bytes long, I get an array of 2 ^ 16 cells if I use direct indexing: this is a real waste, because it is very likely that the array is sparse - that is, many spaces in the array.

To minimize memory usage, I would use an ideal hash generator such as CMPH so that I can display my "n" identifiers for an n-size array without any collision. The disadvantage of this approach is that I have to rely on an external "exoteric" library.

I am wondering - in my case - there are smarter ways to have a constant search for time, while preserving the use of memory; Keep in mind that I'm interested in indexing 16-bit unsigned numbers , and the size of the set is pretty limited.

thanks

+4
source share
1 answer

Since you know that you are dealing with 16-bit values, any search algorithm will be a constant-time algorithm, since there are only O (1) different possible values. Therefore, algorithms that can be slower on the surface (for example, a linear search that works in O (n) for n elements) can be useful here.

Without the perfect hash function, if you want to guarantee a quick search, I would suggest looking at the cuckoo hash , which guarantees the worst case O (1), and expect O (1) to be a temporary insert (although you should be a little smart with your hash -functions). It is very easy to generate hash functions for 16-bit values; if you calculate two 16-bit multipliers and multiply the high and low bits of a 16-bit value by them, and then add them together, I believe that you get a good hash function for any prime number.

Alternatively, if you do not have to have O (1) lookup and in order with the expected lookup time, you can also use a standard hash table with open addressing, such as a hash table of linear probes or a hash table with double hashing . Using a smaller array with such a hash scheme can be extremely fast and should be very easy to implement.

For a completely different approach, if you store sparse data and want to quickly find the search time, it may be useful for you to use a simple balanced binary search tree. For example, the treap data structure is simple to implement and gives the expected O (log n) search results for values. Since you are dealing with 16-bit values, here log n is about 16 (I think the base of the logarithm is actually a bit different), so the search should be pretty fast. This introduces a bit of overhead for each item, but if you only have a few items, this should be easy to implement. For even less overhead, you might want to look into splay trees , which requires only two element pointers.

Hope this helps!

+5
source

Source: https://habr.com/ru/post/1395486/


All Articles