Note. The following SO questions are related to each other, but neither they nor related resources seem to fully answer my questions, especially regarding running equality tests for collections of objects .
- Recommendations for Overriding -isEqual: and -hash
- Methods for implementing -hash on Cocoa mutable objects
Background
NSObject provides standard implementations of -hash (which returns the address of the instance, for example (NSUInteger)self ) and -isEqual: (which returns NO if the receiver addresses and parameter do not match). These methods are intended to be overridden as necessary, but the documentation clearly states that you must provide both or none of them. In addition, if -isEqual: returns YES for two objects, then the -hash result for these objects should be the same. If not, problems can occur when objects that should be the same, for example, two string instances for which -compare: returns NSOrderedSame -, are added to the Cocoa collection or compared directly.
Context
I am developing CHDataStructures.framework , an open source library of Objective-C data structures. I have implemented a number of collections, and currently I am improving and improving their functionality. One of the features I want to add is the ability to compare collections for equality with another.
Instead of comparing only memory addresses, these comparisons should take into account objects present in two collections (including order, if applicable). This approach has a precedent in Cocoa and usually uses a separate method, including the following:
I want my custom collections to be reliable for equality tests, so they can safely (and predictably) be added to other collections and allow others (like NSSet) to determine if two collections are equal / equivalent / duplicate.
Problems
The -isEqualTo...: method works fine on its own, but the classes that define these methods usually also override -isEqual: to call [self isEqualTo...:] if the parameter has the same class (or possibly a subclass ) as the receiver, or [super isEqual:] otherwise. This means that the class must also define -hash so that it returns the same value for disparate instances that have the same content.
In addition, Apple's documentation for -hash provides the following: (emphasis mine)
βIf a changed object is added to a collection that uses hash values ββto determine the position of the object in the collection, the value returned by the hash method of the object should not change while the object is in the collection. Therefore, the hash method should not rely on any either information about the internal state of the object or , you must make sure that information about the internal state of the object does not change while the object is in the collection.for example, a mutable dictionary can be placed in a hash table, but you should not change it, while heβs there. (Note that it can be difficult to find out if the item is in the collection.) "
Edit: I definitely understand why this is necessary and is fully consistent with reasoning. I mentioned this here to provide additional context, and went around the topic of why this happens for the sake of brevity.
All my collections are mutable, and the hash will have to consider at least some content, so the only option here is to consider a programming error to mutate the collection stored in another collection. (My collections all accept NSCopying , so collections like NSDictionary can successfully make a copy for use as a key, etc.)
It makes sense for me to implement -isEqual: and -hash , because (for example) an indirect user of one of my classes may not know the specific method -isEqualTo...: to call or even take care of whether two objects are instances of the same class. They should be able to call -isEqual: or -hash for any variable of type id and get the expected result.
Unlike -isEqual: (which has access to two instances that are compared), -hash should return the result "blindly", having access only to data in a specific instance. Since it cannot know what the hash is for, the result should be consistent for all possible instances, which should be considered equal / identical and should always match -isEqual: hit>. (Edit: This was debunked by the answers below, and it certainly makes life easier.) Also, writing good hash functions is not trivial - guaranteeing uniqueness is a problem, especially when you only have NSUInteger (32/64 bit) in which to represent it .
Questions
- Are there best practices for implementing equality comparisons
-hash for collections? - Are there any features for planning in Objective-C and Cocoa collections?
- Are there any good approaches to unit testing
-hash with a reasonable degree of confidence? - Any suggestions for implementing
-hash to match with -isEqual: for collections containing elements of arbitrary types? What mistakes should I know? ( Edit: Not as problematic as I thought at first, @kperryua points out, β -hash values -hash not imply -isEqual: ".)
Edit: I should have clarified that I'm not confused about how to implement -isEqual: or -isEqualTo ...: for collections, it's simple. I think my confusion came mainly from the (erroneous) thought that -hash SHOULD return a different value if -isEqual: returns NO. Having done cryptography in the past, I thought that hashes for different values ββMUST be different. However, the answers below helped me understand that a βgoodβ hash function is really about minimizing collisions and chaining chains for collections that use -hash . Although unique hashes are preferable, they are not a strict requirement.