Why does the initial hash value in the GetHashCode () implementation generated for an anonymous class depend on property names?

When generating an GetHashCode() implementation for an anonymous class, Roslyn calculates the initial hash value based on the property names. For example, the class generated for

 var x = new { Int = 42, Text = "42" }; 

will have the following GetHashCode() method:

 public override in GetHashCode() { int hash = 339055328; hash = hash * -1521134295 + EqualityComparer<int>.Default.GetHashCode( Int ); hash = hash * -1521134295 + EqualityComparer<string>.Default.GetHashCode( Text ); return hash; } 

But if we change the property names, the initial value will change:

 var x = new { Int2 = 42, Text2 = "42" }; public override in GetHashCode() { int hash = 605502342; hash = hash * -1521134295 + EqualityComparer<int>.Default.GetHashCode( Int2 ); hash = hash * -1521134295 + EqualityComparer<string>.Default.GetHashCode( Text2 ); return hash; } 

What is the reason for this behavior? Is there any problem with choosing a large [prime?] Number and using it for all anonymous classes?

+5
source share
2 answers

Is there any problem with choosing a large [prime?] Number and using it for all anonymous classes?

There is nothing wrong with that; it simply produces less effective value.

The goal of the GetHashCode implementation is to return different results for values ​​that are not equal. This reduces the chance of collisions when values ​​are used in hash-based sets (e.g. Dictionary<TKey, TValue> ).

An anonymous value can never be equal to another anonymous value if they represent different types. The type of anonymous value is determined by the form of properties:

  • Property Name
  • Property Type
  • Number of properties

Two anonymous values ​​that differ in any of these characteristics represent different types and, therefore, can never be equal values.

Given this, it makes sense for the compiler to generate GetHashCode implementations, which, as a rule, return different values ​​for different types. This is why the compiler includes property names when calculating the initial hash.

+6
source

If someone from the Roslin team does not reach the level, we can only guess. I would do the same. Using a different seed for each anonymous type seems like a useful way to have more randomness in the hash codes. For example, this causes new { a = 1 }.GetHashCode() != new { b = 1 }.GetHashCode() be true.

I also wonder if there are any bad seeds that cause the hash code to decay. I do not think so. Even seed 0 will work.

Roslyn source code can be found in AnonymousTypeGetHashCodeMethodSymbol . The initial value of the hash code is based on a hash of anonymous type names.

+4
source

Source: https://habr.com/ru/post/1232396/


All Articles