Reverse Engineering String.GetHashCode

The behavior of String.GetHashCode depends on the architecture of the program. Thus, it will return one value on x86 and one value on x64. I have a test application that should run on x86, and it should predict the hash code output from the application that should run on x64.

The following is a breakdown of the implementation of String.GetHashCode from mscorwks.

public override unsafe int GetHashCode() { fixed (char* text1 = ((char*) this)) { char* chPtr1 = text1; int num1 = 0x15051505; int num2 = num1; int* numPtr1 = (int*) chPtr1; for (int num3 = this.Length; num3 > 0; num3 -= 4) { num1 = (((num1 << 5) + num1) + (num1 >≫ 0x1b)) ^ numPtr1[0]; if (num3 <= 2) { break; } num2 = (((num2 << 5) + num2) + (num2 >> 0x1b)) ^ numPtr1[1]; numPtr1 += 2; } return (num1 + (num2 * 0x5d588b65)); } } 

Can someone pass this function to a safe implementation?

+4
source share
4 answers

Hash codes should not be repeated on different platforms or even multiple runs of the same program on the same system. You are going wrong. If you do not change course, your path will be difficult, and one day it may end in tears.

What is the real problem you want to solve? Is it possible to write your own hash function either as an extension method or as an implementation of the GetHashCode wrapper class and use it instead?

+20
source

First, John is right; This is a crazy errand. The internal debugging framework builds that we use to "eat our own dog" change the hash algorithm every day to prevent people from creating systems - even test systems - that rely on unreliable implementation details that are documented as being subject to change at any time.

Instead of fixing emulation of a system that is documented as not suitable for emulation, my recommendation would be to take a step back and ask yourself why you are trying to do something so dangerous . Is this really a requirement?

Secondly, StackOverflow is a technical question and a site for answers, not "do my work for me for free." If you are damn tuned for this dangerous thing and you need someone who can rewrite unsafe code into equivalent safe code, then I recommend that you hire someone who can do this for you .

+16
source

Although all warnings listed here are valid, they do not answer the question. I had a situation in which GetHashCode (), unfortunately, is already used for constant cost in production, and I had no choice but to repeat the implementation using the standard 32-bit x86 (default) .NET 2.0 algorithm . I transcoded without danger, as shown below, and it seems to work. Hope this helps someone.

 // The GetStringHashCode() extension method is equivalent to the Microsoft .NET Framework 2.0 // String.GetHashCode() method executed on 32 bit systems. public static int GetStringHashCode(this string value) { int hash1 = (5381 << 16) + 5381; int hash2 = hash1; int len = value.Length; int intval; int c0, c1; int i = 0; while (len > 0) { c0 = (int)value[i]; c1 = (int)value[i + 1]; intval = c0 | (c1 << 16); hash1 = ((hash1 << 5) + hash1 + (hash1 >> 27)) ^ intval; if (len <= 2) { break; } i += 2; c0 = (int)value[i]; c1 = len > 3 ? (int)value[i + 1] : 0; intval = c0 | (c1 << 16); hash2 = ((hash2 << 5) + hash2 + (hash2 >> 27)) ^ intval; len -= 4; i += 2; } return hash1 + (hash2 * 1566083941); } 
+3
source

The following accurately reproduces the default String hash codes on .NET 4.7 (and probably earlier). This is the hash code specified:

  • The default value for the String instance: "abc".GetHashCode()
  • StringComparer.Ordinal.GetHashCode("abc")
  • Various String methods that accept an enumeration of StringComparison.Ordinal .
  • System.Globalization.CompareInfo.GetStringComparer(CompareOptions.Ordinal)

Testing releases with full JIT optimizations, these versions are modestly superior to the built-in .NET code and have also been heavily tested for exact equivalence with .NET behavior. Note that there are separate versions for x86 and x64. Your program should include how; below the relevant codelists is a wiring harness that selects the appropriate version at runtime.

x86 - (.NET works in 32-bit mode)

 static unsafe int GetHashCode_x86_NET(int* p, int c) { int h1, h2 = h1 = 0x15051505; while (c > 2) { h1 = ((h1 << 5) + h1 + (h1 >> 27)) ^ *p++; h2 = ((h2 << 5) + h2 + (h2 >> 27)) ^ *p++; c -= 4; } if (c > 0) h1 = ((h1 << 5) + h1 + (h1 >> 27)) ^ *p++; return h1 + (h2 * 0x5d588b65); } 

x64 - (.NET runs in 64-bit mode)

 static unsafe int GetHashCode_x64_NET(Char* p) { int h1, h2 = h1 = 5381; while (*p != 0) { h1 = ((h1 << 5) + h1) ^ *p++; if (*p == 0) break; h2 = ((h2 << 5) + h2) ^ *p++; } return h1 + (h2 * 0x5d588b65); } 

Call / extension method for any platform (x86 / x64):

 readonly static int _hash_sz = IntPtr.Size == 4 ? 0x2d2816fe : 0x162a16fe; public static unsafe int GetStringHashCode(this String s) { /// Note: x64 string hash ignores remainder after embedded '\0'char (unlike x86) if (s.Length == 0 || (IntPtr.Size == 8 && s[0] == '\0')) return _hash_sz; fixed (char* p = s) return IntPtr.Size == 4 ? GetHashCode_x86_NET((int*)p, s.Length) : GetHashCode_x64_NET(p); } 
0
source

Source: https://habr.com/ru/post/1384226/


All Articles