String to Integer Hashing Function with precision

Question

String to Integer Hashing Function with precision

I want to use the hash char array for int or long. The resulting value must adhere to the specified accuracy value. The function I used is below:

int GetHash(const char* zKey, int iPrecision /*= 6*/) { /////FROM : http://courses.cs.vt.edu/~cs2604/spring02/Projects/4/elfhash.cpp unsigned long h = 0; long M = pow(10, iPrecision); while(*zKey) { h = (h << 4) + *zKey++; unsigned long g = h & 0xF0000000L; if (g) h ^= g >> 24; h &= ~g; } return (int) (h % M); }

The string to be hashed is similar to "SAEUI1210.00000010_1".

However, in some cases, this leads to duplication of values. Are there any good alternatives that won't duplicate the same hash for different string values.

+4

c ++ hash

Gayan Jun 18 '09 at 5:14

source share

4 answers

Each hash will have collisions. Period. This is called a happy birthday problem .

You might want to check for cryptographic features like MD5 (relatively fast, and you don't care if it's unsafe), but they will also have conflicts.

+2

sharptooth Jun 18 '09 at 5:20

source share

Hashes generate the same value for different inputs - this is what they do. All you can do is create a hash function with enough distribution or bit depth (or both) to minimize these collisions. Since you have this additional accuracy limit (0-5?), You will be more likely to encounter collisions.

+2

Talljoe Jun 18 '09 at 5:20

source share

MD5 or SHA , There are many open implementations, and the result is unlikely to lead to duplication of results.

+1

Adam matan Jun 18 '09 at 5:29

source share

ASk · Accepted Answer · 2009-06-18T05:20:27+0000

The definition of a hash itself is that it creates duplicate values for some values due to the fact that the range of hash values is smaller than the hashed data space.

In theory, a 32-bit hash has enough range to hash all ~ 6 character strings (only AZ, az, 0-9) without causing a collision. In practice, hashes are not an ideal rearrangement of input data. Given a 32-bit hash, you can expect to get hash collisions after hashing ~ 16 bits of random inputs due to a paradoxical birthday .

Given a static set of data values, it is always possible to construct a hash function created specifically for them that never collides with itself (of course, its output size will be at least log(|data set|) . You should know all possible data values before time.This is called perfect hashing .

As they say, here are a few alternatives that should run you (they are designed to minimize conflicts)

String to Integer Hashing Function with precision

More articles: