Implementation of -hash / -isEqual: / - isEqualTo ...: for Objective-C collections

Note. The following SO questions are related to each other, but neither they nor related resources seem to fully answer my questions, especially regarding running equality tests for collections of objects .

  • Recommendations for Overriding -isEqual: and -hash
  • Methods for implementing -hash on Cocoa mutable objects



Background

NSObject provides standard implementations of -hash (which returns the address of the instance, for example (NSUInteger)self ) and -isEqual: (which returns NO if the receiver addresses and parameter do not match). These methods are intended to be overridden as necessary, but the documentation clearly states that you must provide both or none of them. In addition, if -isEqual: returns YES for two objects, then the -hash result for these objects should be the same. If not, problems can occur when objects that should be the same, for example, two string instances for which -compare: returns NSOrderedSame -, are added to the Cocoa collection or compared directly.

Context

I am developing CHDataStructures.framework , an open source library of Objective-C data structures. I have implemented a number of collections, and currently I am improving and improving their functionality. One of the features I want to add is the ability to compare collections for equality with another.

Instead of comparing only memory addresses, these comparisons should take into account objects present in two collections (including order, if applicable). This approach has a precedent in Cocoa and usually uses a separate method, including the following:

I want my custom collections to be reliable for equality tests, so they can safely (and predictably) be added to other collections and allow others (like NSSet) to determine if two collections are equal / equivalent / duplicate.

Problems

The -isEqualTo...: method works fine on its own, but the classes that define these methods usually also override -isEqual: to call [self isEqualTo...:] if the parameter has the same class (or possibly a subclass ) as the receiver, or [super isEqual:] otherwise. This means that the class must also define -hash so that it returns the same value for disparate instances that have the same content.

In addition, Apple's documentation for -hash provides the following: (emphasis mine)

β€œIf a changed object is added to a collection that uses hash values ​​to determine the position of the object in the collection, the value returned by the hash method of the object should not change while the object is in the collection. Therefore, the hash method should not rely on any either information about the internal state of the object or , you must make sure that information about the internal state of the object does not change while the object is in the collection.for example, a mutable dictionary can be placed in a hash table, but you should not change it, while he’s there. (Note that it can be difficult to find out if the item is in the collection.) "

Edit: I definitely understand why this is necessary and is fully consistent with reasoning. I mentioned this here to provide additional context, and went around the topic of why this happens for the sake of brevity.

All my collections are mutable, and the hash will have to consider at least some content, so the only option here is to consider a programming error to mutate the collection stored in another collection. (My collections all accept NSCopying , so collections like NSDictionary can successfully make a copy for use as a key, etc.)

It makes sense for me to implement -isEqual: and -hash , because (for example) an indirect user of one of my classes may not know the specific method -isEqualTo...: to call or even take care of whether two objects are instances of the same class. They should be able to call -isEqual: or -hash for any variable of type id and get the expected result.

Unlike -isEqual: (which has access to two instances that are compared), -hash should return the result "blindly", having access only to data in a specific instance. Since it cannot know what the hash is for, the result should be consistent for all possible instances, which should be considered equal / identical and should always match -isEqual: hit>. (Edit: This was debunked by the answers below, and it certainly makes life easier.) Also, writing good hash functions is not trivial - guaranteeing uniqueness is a problem, especially when you only have NSUInteger (32/64 bit) in which to represent it .

Questions

  • Are there best practices for implementing equality comparisons -hash for collections?
  • Are there any features for planning in Objective-C and Cocoa collections?
  • Are there any good approaches to unit testing -hash with a reasonable degree of confidence?
  • Any suggestions for implementing -hash to match with -isEqual: for collections containing elements of arbitrary types? What mistakes should I know? ( Edit: Not as problematic as I thought at first, @kperryua points out, β€œ -hash values -hash not imply -isEqual: ".)



Edit: I should have clarified that I'm not confused about how to implement -isEqual: or -isEqualTo ...: for collections, it's simple. I think my confusion came mainly from the (erroneous) thought that -hash SHOULD return a different value if -isEqual: returns NO. Having done cryptography in the past, I thought that hashes for different values ​​MUST be different. However, the answers below helped me understand that a β€œgood” hash function is really about minimizing collisions and chaining chains for collections that use -hash . Although unique hashes are preferable, they are not a strict requirement.

+45
equality data-structures objective-c cocoa chdatastructures
Jul 10 '09 at 22:59
source share
3 answers

I think trying to come up with some useful hash function that will generate unique hash values ​​for collections is a futile exercise. The U62 proposal to combine hashes of all content will not scale well, as it makes the hash function O (n). Hash functions must really be O (1) to ensure good performance, otherwise the hash target will be defeated. (Consider the general construction of Cocoa plists, which are dictionaries containing arrays and other dictionaries, potentially anomalous. Trying to take a hash of a top-level dictionary of a large plist will be painfully slow if the hash functions of the collections were O (n).)

My suggestion is not to worry about the hash of the collection. As you said, -isEqual: means equal -hash values. On the other hand, equal -hash values do not mean -isEqual: This fact gives you many opportunities to create a simple hash.

If you are really concerned about collisions (and you have evidence in specific dimensions of real situations that confirm that something is bothering you), you can still follow the U62 guidelines to some extent. For example, you can take a hash, say, the first and / or last element in the collection, and combine this, for example, with -count collection. This is enough to provide a decent hash.

I hope that answers at least one of your questions.

Regarding # 1: Implementation -isEqual: pretty cut and dry. You list the contents and check isEqual: for each of the elements.

There is one thing that needs to be careful in that it can affect what you decide to do for the functions of your -hash collections. Customers in your collections should also understand the rules governing -isEqual: and -hash . If you use the -hash content in your -hash collection, your collection will break if the contents of " isEqual: and -hash do not match. This is a client error, of course, but this is another argument against dropping the -hash contents of the collection.

No. 2 is kind of vague. Not sure what you mean.

+18
Jul 11 '09 at 6:43
source share

Two collections should be considered equal if they contain the same elements, and, in addition, if the collections are ordered, the elements are in the same order.

As regards hashes for collections, it should be sufficient to combine the hashes of the elements in some way (XOR them or add them modulo). Note that while the rules state that two objects equal in accordance with IsEqual must return the same hash, the opposite is not true: although the hashes are unique, there is no need for a correct solution. Thus, an ordered collection should not take into account the order of the elements.

An excerpt from Apple's documentation is a necessary limitation. The object could not maintain the same hash value under the mutation, and also ensure that objects with the same value have the same hash. This applies to simple objects as well as collections. Of course, it usually matters that the hash of an object changes when it is inside a container that uses the hash to organize its elements. The result of all this is that mutable collections should not mutate if they are placed inside another container, but then none of them should have an object that has a true hash function.

+4
Jul 11 '09 at 0:26
source share

I did some research on the implementation of the hash of the NSArray and NSMutableArray file and (unless I understand something), it looks like Apple without following its own rules:

If a mutable object is added to the collection that uses hash values ​​to determine the position of the object in the collection, the return value must not be changed by the object's hash method while the object is in the collection. Therefore, either the hash method should not rely on any information about the internal state of the object, or you must make sure the information on the internal state of the object does not change, the object is in the collection. So, for example, a mutable dictionary can be placed in a hash table, but you should not modify it while it is in there. (Note that it can be difficult to see if a given object is actually in the collection.)

Here is my test code

 NSMutableArray* myMutableArray = [NSMutableArray arrayWithObjects:@"a", @"b", @"c", nil]; NSMutableArray* containerForMutableArray = [NSMutableArray arrayWithObject:myMutableArray]; NSUInteger hashBeforeMutation = [[containerForMutableArray objectAtIndex:0] hash]; [[containerForMutableArray objectAtIndex:0] removeObjectAtIndex:1]; NSUInteger hashAfterMutation = [[containerForMutableArray objectAtIndex:0] hash]; NSLog(@"Hash Before: %d", hashBeforeMutation); NSLog(@"Hash After : %d", hashAfterMutation); 

Output:

 Hash Before: 3 Hash After : 2 

So this is similar to the default implementation for the Hash method for both NSArray and NSMutableArray - this is an array count, and it does not care if it is inside the collection or not.

+3
Jul 10 2018-12-12T00:
source share



All Articles