How to optimize / improve this hash function

Question

How to optimize / improve this hash function

I have a hash table in which quadtree entries are stored.
The hash function looks like this:

Quadtree hash

#define node_hash(a,b,c,d) \ (((int)(d))+3*(((int)(c))+3*(((int)(b))+3*((int)(a))+3)))

Note that the result of this operation is always reset using a simple simple module:

 h = node_hash(p->nw, p->ne, p->sw, p->se) ; h %= hashprime ; ...

Comparison with Optimal Hash
Some statistical analysis shows that this hash is optimal in terms of collision reduction. Given a hash table with b and n buckets. The risk of collision using an ideal hash:
(n - b * (1 - power((b-1)/b,n)))) * 100 / n
When n = b, this means a collision risk of 37%.

Some tests show that the above hash strings work very well with the norm (for all levels of filling the hash table).

Working hours
Runtime is highly dependent on hashprime

Timing (best of 1000 starts):

 hashprime CPU-cycles per run -------------------------------- 4049 56 16217 68 64871 127 <-- whoooh

Is there a way to improve this while maintaining optimal collision risk?

Or by optimizing the operation of the module (replacing it with multiplication using the "magic" numbers of the computer outside the loop).
Replacing a hash function with another hash function.

Background
The following assembly is produced:

 //--------h = node_hash(p->nw, p->ne, p->sw, p->se) ; mov eax,[rcx+node.nw] <<+ lea eax,[eax+eax*2+3] | add eax,[rcx+node.ne] | lea eax,[eax+eax*2] +- takes +/- 12 cycles add eax,[rcx+node.sw] | lea eax,[eax+eax*2] | add eax,[rcx+node.se] <<+ //--------h %= hashprime ; mov esi,[hashprime] xor edx,edx div esi mov rax,rdx <<--- takes all the rest

[EDIT]
I can do something with the fact that:

C = A % B equivalent to C = A – B * (A / B)
Using the fact that integer division is the same as multiplying by its inverse.
Thus, the transformation of the formula into C = A - B * (A * rB)
Note that for integer division the inverse is magic number, see http://www.hackersdelight.org/magic.htm
C here: http://web.archive.org/web/20070713211039/http://hackersdelight.org/HDcode/magic.c

[FNV hashes]

See: http://www.isthe.com/chongo/tech/comp/fnv/#FNV-1a

 hash = offset_basis for each byte to be hashed hash = hash xor octet_of_data hash = hash * FNV_prime (for 32 bits = 16777619) return hash

For 4 pointers truncated to 32 bits = 16 bytes, the FNV hash takes ~~27 cycles (manual assembly)~~
Unfortunately, this leads to hash collisions of 81%, where 37% should be.
Performing the full 15 multiplications takes 179 cycles.

+6

c ++ optimization c math

Johan Sep 27 '13 at 3:48

source share

3 answers

Johan · Answer 1 · 2013-09-28T04:56:13+0000

Module replacement by mutual multiplication
The main loop enthusiast in this hash function is the module operator.

If you replace this division by multiplying by the inverse, the calculation will be much faster.
Note that reciprocity calculation involves 3 divisions, so this should only be done when the responder can be reused enough times.

OK, the code is used here: http://www.agner.org/optimize/asmlib.zip

From: http://www.agner.org/optimize/

 // ;************************* divfixedi64.asm ********************************* // ; Author: Agner Fog //extern "C" void setdivisoru32(uint Buffer[2], uint d) asm mov r8d, edx // x mov r9, rcx // Buffer dec r8d // r8d = r8d or esi mov ecx, -1 // value for bsr if r8d = 0 bsr ecx, r8d // floor(log2(d-1)) inc r8d inc ecx // L = ceil(log2(d)) mov edx, 1 shl rdx, cl // 2^L (64 bit shift because cl may be 32) sub edx, r8d xor eax, eax div r8d inc eax mov [r9], eax // multiplier sub ecx, 1 setae dl movzx edx, dl // shift1 seta al neg al and al,cl movzx eax, al // shift 2 shl eax, 8 or eax, edx mov [r9+4], eax // shift 1 and shift 2 ret end;

and code for the module:

 //extern "C" uint modFixedU32(uint Buffer[2], uint d) asm mov eax, edx mov r10d, edx // x mov r11d, edx // save x mul dword [rcx] // Buffer (ie: m') sub r10d, edx // xt mov ecx, [rcx+4] // shift 1 and shift 2 shr r10d, cl lea eax, [r10+rdx] mov cl, ch shr eax, cl // Result:= x - m * fastDiv32.dividefixedu32(Buffer, x); mul r8d // m * ... sub r11d, eax // x - (m * ...) mov eax,r11d ret end;

The time difference is as follows:

 hashprime classic hash (mod) new hash new old (# of runs) cycles/run per run (no cache) (no cache) -------------------------------------------------------------------- 4049 56 21 16.6 51 16217 68 not measured 64871 127 89 16.5 50

Cache issues
The increase in cycle time is caused by data cache overflow, which leads to access to the main memory.
This can be clearly seen when I remove the effects of the cache by repeating the same value over and over.

tmyklebu · Answer 2 · 2013-09-27T05:59:38+0000

Something like this might be useful:

 static inline unsigned int hash4(unsigned int a, unsigned int b, unsigned int c, unsigned int d) { unsigned long long foo = 123456789*(long long)a ^ 243956871*(long long)b ^ 918273645*(long long)c ^ 347562981*(long long)d; return (unsigned int)(foo >> 32); }

Replace the four odd numbers I entered with randomly generated 64-bit odd numbers; those above will not work so cool. (64-bit so that 32-bit bits are a random combination of the least significant bits). This is about as fast as the code you gave, but it allows you to use the sizes of two or two tables instead of simple table sizes without fear.

What everyone uses for such loads is the FNV hash . I'm not sure if FNV has better properties than hashes of this type, but it is just as quickly and fairly widely used.

larsmoa · Answer 3 · 2013-09-27T07:23:13+0000

Assuming that hashprime is a constant, you can implement a modular operation as bit masks. I'm not sure of the details, but perhaps this answer may push you in the right direction.

How to optimize / improve this hash function

More articles: