HashSets do not store unique elements if you change their identity

When working with HashSets in C #, I recently encountered an annoying problem: HashSets does not guarantee the uniqueness of elements; they are not sets. They ensure that when calling Add(T item) item is not added if for any item in the set item.equals(that) is true . This no longer works if you are manipulating elements already set in the set. A small program that demonstrates (copypasta from my Linqpad):

 void Main() { HashSet<Tester> testset = new HashSet<Tester>(); testset.Add(new Tester(1)); testset.Add(new Tester(2)); foreach(Tester tester in testset){ tester.Dump(); } foreach(Tester tester in testset){ tester.myint = 3; } foreach(Tester tester in testset){ tester.Dump(); } HashSet<Tester> secondhashset = new HashSet<Tester>(testset); foreach(Tester tester in secondhashset){ tester.Dump(); } } class Tester{ public int myint; public Tester(int i){ this.myint = i; } public override bool Equals(object o){ if (o== null) return false; Tester that = o as Tester; if (that == null) return false; return (this.myint == that.myint); } public override int GetHashCode(){ return this.myint; } public override string ToString(){ return this.myint.ToString(); } } 

He will be happy to manipulate the elements in the collection equal, only filtering them when creating a new HashSet. What is recommended when I want to work with sets where I need to know that the records are unique? Turn over my own, where does the Add (T) element add a copy from the element, and the enumeration lists the copies of the contained elements? This presents a problem in that each containing element must be deeply copied, at least in its elements, which affect its equality.

Another solution would be to collapse your own and take only elements that implement INotifyPropertyChanged and take action on this event to re-check for equality, but this seems to be very limiting, not to mention the big work and performance loss under the hood .

Another possible solution that I was thinking about is to make sure that all fields in readorly or const are constructive. All solutions have very big disadvantages. Do I have other options?

+6
source share
3 answers

Are you really talking about the identity of the object. If you intend to use hash elements, they must have some identification so that they can be compared.

  • If this is a change, this is not a valid identification method. You currently have a public int myint . It really should be readonly and only set in the constructor.
  • If two objects are conceptually different (i.e. you want to consider them as different in your specific project), then their hash code should be different.
  • If you have two objects with the same content (i.e. two value objects that have the same field values), then they must have the same hash codes and should be equal.
  • If your data model says that you can have two objects with the same content, but they cannot be equal, you should use a surrogate identifier, not a hash content.
  • Perhaps your objects should be immutable value types so that the object cannot change
  • If they are mutable types, you must assign a surrogate identifier (that is, one that is entered externally, for example, an increasing counter identifier or the used hash code of an object) that never changes for this object

This is a problem with your Tester objects, not a set. You need to think a lot about how you define identity. This is not an easy task.

+5
source

When I need a 1-dimensional collection of guaranteed unique elements, I usually go with Dictionary<TKey, Tvalue> : you cannot add elements with the same Key , plus I usually need to attach some properties to the elements and Value (my Tuple<> I / O type Tuple<> for many values ​​...).

Of course, this is not the most effective or least hungry solution, but I usually do not have problems with performance and memory.

0
source

You must implement your own IEqualityComparer and pass it to the HashSet constructor to ensure that you get the desired comparison mapper.

And as Joe said, if you want the collection to remain unique even beyond .Add(T item) , you need to use ValueObjects, which are created by the constructor and do not have public set attributes. i.e.

0
source

Source: https://habr.com/ru/post/920108/


All Articles