Java: the simplest integer hash

I need a quick hash function for integers:

int hash(int n) { return ...; } 

Is there something that already exists in Java?

The minimum properties I need are:

  • hash(n) & 1 does not appear periodically when used with a series of consecutive n values.
  • hash(n) & 1 approximately equally likely 0 or 1.
+6
source share
4 answers

I conducted several experiments (see the test program below); computing 2 ^ n in Galois fields and gender (A * sin (n)), both did very well to create a sequence of "random" bits. I tried multiplicative congruent random number generators and some algebra and CRC (which is similar to k * n in Galois fields), none of which succeeded.

The gender approach (A * sin (n)) is the simplest and fastest; computing 2 ^ n in GF32 takes about 64 multiplications and 1024 XORs in the worst case, but the frequency of the output bits is extremely clear in the context of linear feedback shift registers.

 package com.example.math; public class QuickHash { interface Hasher { public int hash(int n); } static class MultiplicativeHasher1 implements Hasher { /* multiplicative random number generator * from L'Ecuyer is x[n+1] = 1223106847 x[n] mod (2^32-5) * http://dimsboiv.uqac.ca/Cours/C2012/8INF802_Hiv12/ref/paper/RNG/TableLecuyer.pdf */ final static long a = 1223106847L; final static long m = (1L << 32)-5; /* * iterative step towards computing mod m * (j*(2^32)+k) mod (2^32-5) * = (j*(2^32-5)+j*5+k) mod (2^32-5) * = (j*5+k) mod (2^32-5) * repeat twice to get a number between 0 and 2^31+24 */ private long quickmod(long x) { long j = x >>> 32; long k = x & 0xffffffffL; return j*5+k; } // treat n as unsigned before computation @Override public int hash(int n) { long h = a*(n&0xffffffffL); long h2 = quickmod(quickmod(h)); return (int) (h2 >= m ? (h2-m) : h2); } @Override public String toString() { return getClass().getSimpleName(); } } /** * computes (2^n) mod P where P is the polynomial in GF2 * with coefficients 2^(k+1) represented by the bits k=31:0 in "poly"; * coefficient 2^0 is always 1 */ static class GF32Hasher implements Hasher { static final public GF32Hasher CRC32 = new GF32Hasher(0x82608EDB, 32); final private int poly; final private int ofs; public GF32Hasher(int poly, int ofs) { this.ofs = ofs; this.poly = poly; } static private long uint(int x) { return x&0xffffffffL; } // modulo GF2 via repeated subtraction int mod(long n) { long rem = n; long q = uint(this.poly); q = (q << 32) | (1L << 31); long bitmask = 1L << 63; for (int i = 0; i < 32; ++i, bitmask >>>= 1, q >>>= 1) { if ((rem & bitmask) != 0) rem ^= q; } return (int) rem; } int mul(int x, int y) { return mod(uint(x)*uint(y)); } int pow2(int n) { // compute 2^n mod P using repeated squaring int y = 1; int x = 2; while (n > 0) { if ((n&1) != 0) y = mul(y,x); x = mul(x,x); n = n >>> 1; } return y; } @Override public int hash(int n) { return pow2(n+this.ofs); } @Override public String toString() { return String.format("GF32[%08x, ofs=%d]", this.poly, this.ofs); } } static class QuickHasher implements Hasher { @Override public int hash(int n) { return (int) ((131111L*n)^n^(1973*n)%7919); } @Override public String toString() { return getClass().getSimpleName(); } } // adapted from http://www.w3.org/TR/PNG-CRCAppendix.html static class CRC32TableHasher implements Hasher { final private int table[]; static final private int polyval = 0xedb88320; public CRC32TableHasher() { this.table = make_table(); } /* Make the table for a fast CRC. */ static public int[] make_table() { int[] table = new int[256]; int c; int n, k; for (n = 0; n < 256; n++) { c = n; for (k = 0; k < 8; k++) { if ((c & 1) != 0) c = polyval ^ (c >>> 1); else c = c >>> 1; } table[n] = (int) c; } return table; } public int iterate(int state, int i) { return this.table[(state ^ i) & 0xff] ^ (state >>> 8); } @Override public int hash(int n) { int h = -1; h = iterate(h, n >>> 24); h = iterate(h, n >>> 16); h = iterate(h, n >>> 8); h = iterate(h, n); return h ^ -1; } @Override public String toString() { return getClass().getSimpleName(); } } static class TrigHasher implements Hasher { @Override public String toString() { return getClass().getSimpleName(); } @Override public int hash(int n) { double s = Math.sin(n); return (int) Math.floor((1<<31)*s); } } private static void test(Hasher hasher) { System.out.println(hasher+":"); for (int i = 0; i < 64; ++i) { int h = hasher.hash(i); System.out.println(String.format("%08x -> %08x %%2 = %d", i,h,(h&1))); } for (int i = 0; i < 256; ++i) { System.out.print(hasher.hash(i) & 1); } System.out.println(); analyzeBits(hasher); } private static void analyzeBits(Hasher hasher) { final int N = 65536; final int maxrunlength=32; int[][] runs = {new int[maxrunlength], new int[maxrunlength]}; int[] count = new int[2]; int prev = -1; System.out.println("Run length test of "+N+" bits"); for (int i = 0; i < maxrunlength; ++i) { runs[0][i] = 0; runs[1][i] = 0; } int runlength_minus1 = 0; for (int i = 0; i < N; ++i) { int b = hasher.hash(i) & 0x1; count[b]++; if (b == prev) ++runlength_minus1; else if (i > 0) { ++runs[prev][runlength_minus1]; runlength_minus1 = 0; } prev = b; } ++runs[prev][runlength_minus1]; System.out.println(String.format("%d zeros, %d ones", count[0], count[1])); for (int i = 0; i < maxrunlength; ++i) { System.out.println(String.format("%d runs of %d zeros, %d runs of %d ones", runs[0][i], i+1, runs[1][i], i+1)); } } public static void main(String[] args) { Hasher[] hashers = { new MultiplicativeHasher1(), GF32Hasher.CRC32, new QuickHasher(), new CRC32TableHasher(), new TrigHasher() }; for (Hasher hasher : hashers) { test(hasher); } } } 
+1
source

HashMap , as well as hash- based utilities from Guava, use the following method for hashCode() results to improve bit allocation and protection against weaker hash functions:

  /* * This method was written by Doug Lea with assistance from members of JCP * JSR-166 Expert Group and released to the public domain, as explained at * http://creativecommons.org/licenses/publicdomain * * As of 2010/06/11, this method is identical to the (package private) hash * method in OpenJDK 7 java.util.HashMap class. */ static int smear(int hashCode) { hashCode ^= (hashCode >>> 20) ^ (hashCode >>> 12); return hashCode ^ (hashCode >>> 7) ^ (hashCode >>> 4); } 
+4
source

So, I read this question, thought hmm, this is a pretty mathematical question, maybe from my league. Then I spent so much time thinking about it, that I really think I have the answer: no function can satisfy the criteria that f (n) and 1 are non-periodic for consecutive values โ€‹โ€‹of n.

I hope someone will tell me how ridiculous my reasoning is, but until then I think itโ€™s right.

Here: Any binary integer n can be represented as 1 ... 0 or 1 ... 1, and only the least significant bit of this bitmap will affect the result of n and 1. Next, the next consecutive integer n + 1 will always contain the opposite least significant bit. Thus, it is obvious that any series of consecutive integers will have period 2 when passing the functions n and 1. So, is there any function f (n) that will suffice to distribute a series of consecutive integers, so that periodicity is excluded?

Any function f (n) = n + c fails, since c must end with either 0 or 1, so the LSB will either flip over or remain unchanged depending on the chosen constant.

The above also excludes subtraction for all trivial cases, but I have not yet found time to analyze the transfer behavior, so there might be a crack.

Any function f (n) = c * n fails because LSB will always be 0 if c ends with 0 and will always be equal to LSB n if c ends with 1.

Any function f (n) = n ^ c fails, by similar reasoning. The power function will always have the same LSB as n.

Any function f (n) = c ^ n fails for the same reason.

The unit and module were a little less intuitive for me, but basically LSB of any of the options is ultimately determined by subtraction (already excluded). The module will also obviously have a period equal to the divisor.

Unfortunately, I do not have the rigor needed to prove this, but I believe that any combination of the above operations will also ultimately fail. This makes me think that we can exclude any transcendental function, because they are implemented using polynomials (the Taylor series is not a terminological guy).

Finally, I gave up hope of traveling by train home, believing that the bit would work; however, in reality it is also a periodic function. The way I thought about this is to take the sum of the digits of any decimal number. This amount will obviously work from 0 to 9, then reset to 1, start from 1 to 10, and then drop to 2 ... This has a period, the range just keeps changing higher, the higher we calculate. In fact, we can do the same for the sum of binary digits, and in this case we get something like: 0,1,1,2,2, ... 5,5,6,6,7,7,8 , 8....

I left nothing?

TL DR I do not think your question has an answer.

+2
source

[SO decided to translate my "trivial answer" to a comment. Attempting to add small text to it to see if it can be fooled]

Unless you need a hash ranger wider.

The NumberOfSetBits function seems to be a lot different than the hash code and, as such, seems more appropriate for your needs. It turns out that there is a fairly efficient algorithm on SO.

See Best Algorithm for Counting the Number of Bits in a 32-Bit Integer .

+1
source

Source: https://habr.com/ru/post/910280/


All Articles