Suitable hash code methods for byte array?

What is the best hashing method for a byte array?

Arrays are serialized class objects containing a jpeg image transmitted between applications via TCP / IP.

The size of the array is about 200 thousand.

+6
source share
4 answers

Any built-in hash function should do; depending on how much you care about collisions, these are your options (from most conflicts to a minimum):

  • MD5
  • SHA1
  • SHA256
  • SHA384
  • SHA512

They are as easy to use as:

 var hash = SHA1.Create().ComputeHash(data); 

Bonus signs: If you donโ€™t care about security (which I donโ€™t think you give hashes for images), you can look at the murmur hash, which is intended for hashing content and insecure hashing (and therefore much faster) . However, this is not the case, so you will need to find an implementation (and you probably should go for Murmur3).

Edit: If you are looking for a HASHCODE for a byte [] array, it is entirely up to you, it usually consists of a bit offset (by simple characters) and XORing, for example.

 public class ByteArrayEqualityComparer : IEqualityComparer<byte[]> { public static readonly ByteArrayEqualityComparer Default = new ByteArrayEqualityComparer(); private ByteArrayEqualityComparer() { } public bool Equals(byte[] x, byte[] y) { if (x == null && y == null) return true; if (x == null || y == null) return false; if (x.Length != y.Length) return false; for (var i = 0; i < x.Length; i++) if (x[i] != y[i]) return false; return true; } public int GetHashCode(byte[] obj) { if (obj == null || obj.Length == 0) return 0; var hashCode = 0; for (var i = 0; i < obj.Length; i++) // Rotate by 3 bits and XOR the new value. hashCode = (hashCode << 3) | (hashCode >> (29)) ^ obj[i]; return hashCode; } } // ... var hc = ByteArrayEqualityComparer.Default.GetHashCode(data); 

EDIT: If you want to verify that the value has not changed, you must use CRC32 .

+9
source

Jon Skeet has a good answer on how to override GetHashCode based on common efficient hash methods, where you start with a prime number, add it to the component hash codes multiplied by another prime number, which allows overflow.

In your case, you should:

 static int GetByteArrayHashCode(byte[] array) { unchecked { int hash = 17; // Cycle through each element in the array. foreach (var value in array) { // Update the hash. hash = hash * 23 + value.GetHashCode(); } return hash; } } 

Note that Jon answers why this is better than XORing the hashes of individual elements (and that anonymous types in C # do not currently have XOR hashes of individual elements, but use something similar to the above).

Although this will be faster than the hash algorithms from System.Security.Cryptography namespace (since you are dealing with smaller hashes), the disadvantage is that you may have more collisions.

You will have to test your data and determine how often you get a collision with the work that needs to be done in the event of a collision.

+4
source

Any of the cryptography hash files should work. Not sure about the speed. Maybe MD5?

+2
source

Based on GetHashCode () created by the compiler

 public static int GetHashCode(byte[] array) { unchecked { int i = 0; int hash = 17; int rounded = array.Length & ~3; hash = 31 * hash + array.Length; for (; i < rounded; i += 4) { hash = 31 * hash + BitConverter.ToInt32(array, i); } if (i < array.Length) { int val = array[i]; i++; if (i < array.Length) { val |= array[i] << 8; i++; if (i < array.Length) { val |= array[i] << 16; } } hash = 31 * hash + val; } return hash; } } 

Ah ... and a link to C # Murmurhash http://landman-code.blogspot.com/2009/02/c-superfasthash-and-murmurhash2.html

+2
source

Source: https://habr.com/ru/post/896207/


All Articles