The definition of a hash itself is that it creates duplicate values ββfor some values ββdue to the fact that the range of hash values ββis smaller than the hashed data space.
In theory, a 32-bit hash has enough range to hash all ~ 6 character strings (only AZ, az, 0-9) without causing a collision. In practice, hashes are not an ideal rearrangement of input data. Given a 32-bit hash, you can expect to get hash collisions after hashing ~ 16 bits of random inputs due to a paradoxical birthday .
Given a static set of data values, it is always possible to construct a hash function created specifically for them that never collides with itself (of course, its output size will be at least log(|data set|) . You should know all possible data values ββbefore time.This is called perfect hashing .
As they say, here are a few alternatives that should run you (they are designed to minimize conflicts)
source share