Why doesn't the equals method in String use a hash?

The code of the equals method in the String class is

 public boolean equals(Object anObject) { if (this == anObject) { return true; } if (anObject instanceof String) { String anotherString = (String)anObject; int n = count; if (n == anotherString.count) { char v1[] = value; char v2[] = anotherString.value; int i = offset; int j = anotherString.offset; while (n-- != 0) { if (v1[i++] != v2[j++]) return false; } return true; } } return false; } 

My question is: why does this method not use hashCode ()?

As far as I know, hashCode () can quickly compare two strings.

UPDATE: I know that two unequal lines can have the same hashes. But two equal lines have equal hashes. So, using hashCode (), we can immediately see that the two lines are unequal.

I just think that using hashCode () can be a good filter in equals .

UPDATE 2: Here is some code that we are talking about here.

This is an example of how a String method might look like

 public boolean equals(Object anObject) { if (this == anObject) { return true; } if (anObject instanceof String) { String anotherString = (String)anObject; if (hashCode() == anotherString.hashCode()){ int n = count; if (n == anotherString.count) { char v1[] = value; char v2[] = anotherString.value; int i = offset; int j = anotherString.offset; while (n-- != 0) { if (v1[i++] != v2[j++]) return false; } return true; } }else{ return false; } } return false; } 
+42
java string hashcode
Jan 10 '13 at 16:20
source share
8 answers

Hashcode can be a check of the first round for inequality. However, it presents some compromises.

  • String hash codes are lazily computed, although they use the value "guard". If you compare strings with long lifetimes (that is, they most likely calculated a hash code), that is not a problem. Otherwise, you are stuck in computing a hash code (potentially expensive) or ignoring checking when the hash code has not yet been computed. If you have many short-lived lines, you will ignore the check more often than you will use it.
  • In the real world, most strings differ in their first few characters, so you won’t save much by checking hashcode first. Of course, there are exceptions (for example, URLs), but then again, in real programming they happen infrequently.
+37
Jan 10 '13 at
source share
— -

This issue has actually been reviewed by the JDK developers. I could not find in various posts why it was not included. The enhancement is also included in the error database .

Namely, one of the proposed changes:

 public boolean equals(Object anObject) { if (this == anObject) // 1st check identitiy return true; if (anObject instanceof String) { // 2nd check type String anotherString = (String)anObject; int n = count; if (n == anotherString.count) { // 3rd check lengths if (n != 0) { // 4th avoid loading registers from members if length == 0 int h1 = hash, h2 = anotherString.hash; if (h1 != 0 && h2 != 0 && h1 != h2) // 5th check the hashes return false; 

There was also a discussion of using == for interned strings (i.e. if both strings are interned: if (this != anotherString) return false; ).

+15
Jan 10 '13 at 16:43
source share

1) The calculation of hashCode may not be faster than comparing strings directly.

2) if hashCode is equal, the lines may still not be equal

+9
Jan 10 '13 at 16:21
source share

AFAIK. You can add the following check to String. This checks that if the hash codes are set and they are different, then the strings cannot be equal.

 if (hash != 0 && anotherString.hash != 0 && hash != anotherString.hash) return false; if (hash32 != 0 && anotherString.hash32 != 0 && hash32 != anotherString.hash32) return false; 
+3
Jan 10 '13 at
source share

This may be a good idea for many use cases.

However, as a foundation class, which is widely used in all types of applications, the author really does not know if this additional test can save or degrade performance on average.

I assume most String.equals() is called in the Hashmap after the hash codes are known to be equal, so testing the hash codes again is pointless.

If we look at comparing 2 random strings, even with a small char value like US ASCII, it is very likely that the hashes will be different, and the char -by-char comparison will fail with a 1st char error, so that would be a waste of checking the hashes .

+3
Jan 10 '13 at 17:17
source share

As I think, hashCode () can more quickly compare two string comparisons.

Arguments?

Arguments against this proposal:

  • Additional operations

hashcode() from String should access each character in the string and perform 2 calculations for each character.
Therefore, we need a string with n characters of 5*n operations (load, multiply, search / load, multiply, save). Two times because we are comparing two lines. (Well, one repository and one load are not really taken into account in a reasonable implementation.)
In the best case, this does the 10*x operation for two lines with lengths m and n and x=min(m,n) . The worst case is 10*x with x=m=n . The average is somewhere between, possibly (m*n)/2 .

Current is equal to implementation needs in the best cases 3 . 2 loads, 1 compare. The worst thing is 3*x operations for two lines with lengths m and n and x=m=n . The average is somewhere between maybe 3*(m*n)/2 .

  • Even if we cache the hash, it’s not clear that we are saving anything

We need to analyze usage patterns. It may be that most of the time we will ask once for equal, and not several times. Even if we ask a few times, this is not enough to save time on caching.

Not directly against speed, but still good counterarguments:

  • Intuitive counter

We do not expect hashcode to equal equals, because we know for sure that hash(a)==hash(b) for some a!=b Everyone who reads this (and knowledge of hashing) is wondering what is going on there.

  • leads to unsuccessful examples / unexpected behavior

I already see the following question about SO: “I have a line with a billion times“ a. ”Why is it worth forever comparing it with equal () versus“ b ”? :)

+1
Jan 10 '13 at 18:01
source share

A hash string is not available for free and automatically. To rely on a hash code, it must be calculated for both strings, and only then can they be compared. Since collisions are possible, a second char -by-char comparison is required if the hash codes are equal.

While String seems unchanged to the ordinary programmer, it has a private field for storing its hash code after it has been computed. However, this field is only calculated when a hash code is required first. As you can see from the String source code here :

  private int hash; public int hashCode() { int h = hash; if (h == 0) { ... hash = h; } return h; } 

Therefore, it is not obvious that it makes sense to first calculate the hash code. For your specific case (perhaps the same instances of really long strings are compared to each other really many times), it could still be: profile.

0
Jan 10 '13 at 16:30
source share

If the hash code takes into account the entire contents of the string, then computing the hash code of a string with n characters takes n operations. For long lines this is a lot. Comparing two strings takes n operations, if they are the same, no longer than computing a hash. But if the lines are different from each other, then the difference is likely to be found much earlier.

String hash functions usually do not account for all characters for very long strings. In this case, if I compare two strings, I could first compare the characters used by the hash function, and I am at least as fast as checking the hashes. But if there is no difference between these characters, then the hash value will be the same, so I still have to compare all the strings.

Summary Good string comparisons never happen slower, but often much faster than hashes comparison (and string comparisons when hashes match).

0
Mar 27 '14 at 12:30
source share



All Articles