The hashCode () override is consistent with equals () when equals () uses a similarity metric

Question

The hashCode () override is consistent with equals () when equals () uses a similarity metric

Say I have a Car class with a color and a field model. I need to store cars in a collection in which I will not have duplicates (there are no two identical cars). In the example below, I am using HashMap.

According to the Java documentation, if we have 2 Car objects car1 and car2, such as car1.equals(car2) == true , then it should also contain this car1.hashCode() == car2.hashCode() . Therefore, in this example, if I wanted to compare cars only by their color, then I would use only the color field in equals() and hashCode() , as it did in my code, and it works fine.

 public class Car { String color; String model; @Override public int hashCode() { final int prime = 31; int result = 1; result = prime * result + ((color == null) ? 0 : color.hashCode()); return result; } @Override public boolean equals(Object obj) { if (this == obj) return true; if (obj == null) return false; if (getClass() != obj.getClass()) return false; Car other = (Car) obj; if (color == null) { if (other.color != null) return false; } else if (!color.equals(other.color)) return false; return true; } public Car(String color, String model) { super(); this.color = color; this.model = model; } @Override public String toString() { return color + "\t" + model; } public static void main(String[] args) { Map<Car, Car> cars = new HashMap<Car, Car>(); Car a = new Car("red", "audi"); Car b = new Car("red", "bmw"); Car c = new Car("blue", "audi"); cars.put(a, a); cars.put(b, b); cars.put(c, c); for(Car car : cars.keySet()) { System.out.println(cars.get(car)); } }

}

Output:
red bmw
blue audi

as was expected.

So good. Now I'm experimenting with other ways to compare two cars. I have provided a function for measuring the similarity between two cars. For the sake of argument, we can say that I have a double similarity(Car car1, Car car2) method double similarity(Car car1, Car car2) , which returns a double value in the interval [0,1]. I believe that 2 cars should be equal if their similarity function returns a value greater than 0.5. Then I override the equals method:

 @Override public boolean equals(Object obj) { Car other = (Car) obj; return similarity(this, other) > 0.5; }

Now I don’t know how to override hashCode () to be sure that it will always contain a hash code - an equal contract, for example. 2 equal objects always have equal hash codes.

I thought to use TreeMap instead of HashMap to avoid hashCode overriding, because I have no idea how to do this correctly. But I don’t need sorting, so I believe that using TreeMap in this problem is not suitable, and I think it will be more expensive in terms of complexity.

It would be very helpful if you could offer me: a way to override hashCode or an alternative to another structure that would be more suitable for my problem.

Thank you in advance!

+6

java equals hashcode

giliev Feb 09 '15 at 12:49

source share

6 answers

k_g · Answer 1 · 2015-02-09T01:08:33+0000

Although sprinter has raised some issues with your strategy, there is a contract issue with your method. According to Javadok,

[ equals ] is transitive: for any nonempty reference values x, y and z, if x.equals (y) returns true and y.equals (z) returns true, then x.equals (z) should return true

However, x might look like y , and y might look like z when x is too far from z to look like, so your equals method doesn't work.

Mika'il · Answer 2 · 2015-02-09T01:17:12+0000

You should not interfere with equals and hashcode methods this way. Collection data structures depend on these methods, and using them in non-standard mode will lead to unexpected behavior.

I suggest you create a Comparator implementation that will compare two cars or implement a Comparable interface where you can use the similarity method below.

sprinter · Answer 3 · 2015-02-09T00:59:24+0000

There are a few points here.

The first is the unusual use of equals . Usually equals interpreted to mean that they are two instances of the same object; You can replace another without a blow.

The second point is that a.equals(b) means that a.hashCode() == b.hashCode() , but not vice versa. In fact, it is completely legal (albeit pointless) that all objects return the same hash code. Thus, in your case, if all fairly similar cars return the same hash code, different collections will work correctly.

I suspect it is more likely that you should have a separate class to represent your "similar" concept. You can then check for similarities or a map similar to car listings. This may be a better concept presentation than overloading equals for cars.

Matt mchenry · Answer 4 · 2015-02-09T01:06:45+0000

hashCode() is just a "short shorthand" for equals() . It is important that the circuit in which you work makes sense for equals . Consider cars a , b and c , where similarity(a, b) == 0.3 and similarity(b, c) == 0.3 .

But what if similarity(a, c) == 0.6 ? Then you are in a situation where a.equals(b) and b.equals(c) , but mysteriously a.equals(c) is false.

This violates the general contract of Object.equals() . When this happens, parts of the standard library, such as HashMap and TreeMap , will suddenly start behaving very strangely.

If you are interested in connecting to different sorting schemes, you are much better off working with the different Comparator<Car> that each implements your scheme. Although the same restrictions apply in Comparator API ^1, it allows you to represent less and more than what it sounds like you really are after that, and which cannot be done using Object.equals() .

[1] If compare(a,b) == compare(b,c) == 0 , then compare(a,c) should be 0 .

Federico peralta schaffner · Answer 5 · 2015-02-09T15:26:16+0000

As others have stated, your latest implementation of .equals() violates his contract. You simply cannot implement it that way. And if you stop thinking about it, it makes sense, since your .equals() implementation is not intended to return true when two objects are actually equal, but when they are quite similar. But it looks pretty much the same as Java, or anywhere else.

Check the .equals() javadocs and you will see that any object that implements it must adhere to its contract:

The equals method implements an equivalence relation for non-zero object references:
This is reflective: For any nonzero reference x, x.equals (x) should return true
It is symmetrical: for any non-empty reference values x and y x.equals (y) should return true if and only if y.equals (x) returns true.
This is transitive: for any non-zero reference values x, y and z, if x.equals (y) returns true and y.equals (z) returns true, then x.equals (z) should return true.
This is consistent: for any non-empty reference values x and y, several calls to x.equals (y) successively return true or successively return false if the information used in equal comparisons with objects does not change.
For any non-zero reference x, x.equals (NULL) should return false.

Your implementation of .equals() does not fulfill this contract:

Depending on your implementation of double similarity(Car car1, Car car2) it may not be symmetrical
This is clearly not transitive (well explained in previous answers)
This can be consistent:

Consider an example slightly different from the one you gave in the comment:

'cobalt' will be equal to blue, while red will be different from blue

If you used some external source to calculate the similarity, for example, a dictionary, and if one day “cobalt” was not found as a record, you can return the similarity about 0.0, so the cars will not be equal. However, the next day you will understand that “cobalt” is a special kind of “blue”, so you add it to the dictionary, and this time when you compare the same cars, the similarities are very large (or around 1.0), so they are equal. It will be inconsistency. I don’t know how your affinity function works, but if it depends on something other than the data contained in the two objects you are comparing, you can violate the .equals() sequence restriction.

Regarding the use of TreeMap<Car, Whatever> , I don’t see how this could help. From TreeMap javadocs :

... the map interface is defined in terms of the equality operation, but the sorted map performs all key mappings using the compareTo (or comparison) method, so the two keys that are considered equal to this method are, from the point of view of an ordered mapping, equal.

In other words, in TreeMap<Car, Whatever> map , map.containsKey(car1) will return true iff car1.compareTo(car2) , returning exactly 0 for some car2 belonging to map . However, if the comparison did not return 0 , map.containsKey(car1) could return false , even though car1 and car2 were very similar in terms of your similarity function. This is because .compareTo() intended for ordering, not similarity.

So, the key point here is that you cannot only use map for your use , because this is just the wrong structure. In fact, you cannot use any Java structure that relies on .hashCode() and .equals() because you can never find an object that matches your key.

Now, if you want to find the car that most closely resembles a given car using your similarity() function, I suggest you use Guava HashBasedTable to create a table of similarity coefficients (or any other favorite name) between each car of your set.

For this approach, Car will need to implement .hashCode() and .equals() as usual (i.e. do not check only by color and, of course, without calling your similarity() function). For example, you can check the new attribute number Car .

The idea is to have a table that stores the similarities between each car, with its diagonal purity, since we already know that the car is similar to itself (in fact, it is equal to itself). For example, for the following vehicles:

 Car a = new Car("red", "audi", "plate1"); Car b = new Car("red", "bmw", "plate2"); Car c = new Car("light red", "audi", "plate3");

the table will look like this:

  abc a ---- 0.60 0.95 b 0.60 ---- 0.45 c 0.95 0.45 ----

As for the similarity values, I assume that cars of the same brand and the same color family are more similar than cars of the same color, but different brands, and that cars of different brands and more than one color are even less similar.

You may have noticed that the table is symmetrical. We could only store half the cells if space optimization were needed. However, according to the docs, the HashBasedTable optimized for accessing the row key, so let it simplify and allow further optimization as an exercise.

The search algorithm for the car that is most similar to this car can be described as follows:

Get the given row of the car
Return the car that is most similar to this car in the returned row, i.e. highest similarity series car

Here is some code that shows common ideas:

 public class SimilarityTest { Table<Car, Car, Double> table; void initialize(Car... cars) { int size = cars.length - 1; // implicit null check this.table = HashBasedTable.create(size, size); for (Car rowCar : cars) { for (Car columnCar : cars) { if (!rowCar.equals(columnCar)) { // add only different cars double similarity = this.similarity(rowCar, columnCar); this.table.put(rowCar, columnCar, similarity); } } } } double similarity(Car car1, Car car2) { // Place your similarity calculation here } Car mostSimilar(Car car) { Map<Car, Double> row = this.table.row(car); Map.Entry mostSimilar = Maps.immutableEntry(car, Double.MIN_VALUE); for (Map.Entry<Car, Double> entry : row.entrySet()) { double mostSimilarCoefficient = mostSimilar.getValue(); double currentCoefficient = entry.getValue(); if (currentCoefficient > mostSimilarCoefficient) { mostSimilar = entry; } } return mostSimilar.getKey(); } public static void main(String... args) { SimilarityTest test = new SimilarityTest(); Car a = new Car("red", "audi", "plate1"); Car b = new Car("red", "bmw", "plate2"); Car c = new Car("light red", "audi", "plate3"); test.initialize(a, b, c); Car mostSimilarToA = test.mostSimilar(a); System.out.println(mostSimilarToA); // should be c Car mostSimilarToB = test.mostSimilar(b); System.out.println(mostSimilarToB); // should be a Car mostSimilarToC = test.mostSimilar(c); System.out.println(mostSimilarToC); // should be a } }

Regarding complexity ... Initializing a table takes O (n2), while finding the most similar car takes O (n). I am sure that this can be improved, that is, why put cars in the table, which, as you know, are not similar to each other? (we could only put cars with a similarity coefficient above a given threshold), or instead of finding a car with the highest similarity coefficient, we could stop searching when we find a car whose similarity coefficient is above another given threshold, etc.

Jlewkovich · Answer 6 · 2015-02-09T02:11:02+0000

Based on my understanding of your similarity() method, I think it's best to keep the hashCode() function roughly the same, but instead of using color.hashCode() create a helper method that will generate a “similar color”, and use this hashCode:

 public int getSimilarColor(String color) { if(color == "blue" || color == "light blue" || color == "dark blue" /* add more blue colors*/) { return "blue"; } else if(color == "red" || color == "light red" || color == "dark red" /* add more red colors*/) { return "red"; } /* else if(yellow...) else if(etc...) */ else { return color; } }

And then use it in the hashCode method:

 @Override public int hashCode() { final int prime = 31; int result = 1; result = prime * result + ((color == null) ? 0 : getSimilarColor(color).hashCode()); return result; }

This helper method can also be useful in similarity() . If you don’t like hard-coded colors in your method, you can use some other means to create them, for example, to match with a pattern.

The hashCode () override is consistent with equals () when equals () uses a similarity metric

More articles: