The bucket instance hash key

Which algorithm gives the best distribution when it comes to displaying hash key --> bucket instance ?

In other words, let's say I have a hash function (maybe SHA-1), and I have n buckets; What algorithm do I use to map the key to the bucket? For instance. low bits, high bits, something else?

+4
source share
2 answers

Usually you simply mod have a hash value with the number of buckets. In the unlikely event that the number of buckets is two power, you can use bitwise - and instead.

Wikipedia excerpt about hash functions :

A common solution is to calculate a fixed hash function with a very large range (say, from 0 to 2 32 - 1), divide the result by n and use the division left. If n itself is a power of 2, this can be done by bit masking and bit offset. When this approach is used, the hash function must be chosen so that the result is a fairly uniform distribution between 0 and n-1, for any n that can take place in the expression. Depending on the function, the remainder can be homogeneous only for some n, for example. odd or prime numbers.

+2
source

SHA-1 and other cryptographic hash functions should already give you a fairly even distribution, as a rule, they behave like a random function (which generates all outputs with equal probability).

So, just select the right number of bits from the function output to give you a number in the desired range.

You should study the literature on hash functions and hash tables in order to better understand the space so that you can make informed choices according to your requirements. You can start with Wikipedia or a textbook of algorithms such as the CLR . In the end, you need to move on to Knuth .

0
source

Source: https://habr.com/ru/post/1379486/


All Articles