Saturday, 12 February 2011

Implementing the equals() hashCode() Contract

The Java documentation for the Object.equals() method states that: "it is generally necessary to override the hashCode method whenever this method is overridden, so as to maintain the general contract for the hashCode method, which states that equal objects must have equal hash codes." This is rather vague, is it necessary, or just a good idea to implement hashCode() when you override equals()?

The reason you need to produce good hash codes is for efficient handling of your objects by collection classes, especially Hashtable and HashMap, but use of the word generally gives you the feeling that you can be lazy and ignore the rule.

But can you? I suspect not and that the only way to guarantee the contract is to actually do those implementations; hence this blog.

The Java API goes on in great details about how you should implment equals() and hashCode() mentioning words like reflexive, symmetric and transitive, but being a programmer, I thought that it would be simpler to write the contract as code:

  public static boolean hashContract(Object a, Object b) {

   
return (a.equals(b)) && (a.hashCode() == b.hashCode());
 
}

Using the sample class EqualsHash below, the rest of the blog demonstrates how to implement its equals() and hashCode() methods.

public class EqualsHash {

 
private final int num;
 
private final String data;

 
public EqualsHash(int num, String data) {
   
this.num = num;
   
this.data = data;
 
}
}

Correct Implementation of equals()

To correctly implement the equals method, the object must obey the following equivalence relation:


  • It is reflexive: for any reference value x, x.equals(x) should return true.
  • It is symmetric: for any reference values x and y, x.equals(y) should return true
    if and only if y.equals(x) returns true.
  • It is transitive: for any reference values x, y, and z, if x.equals(y) returns true
    and y.equals(z) returns true, then x.equals(z) should return true.
  • It is consistent: for any reference values x and y, multiple invocations of x.equals(y) consistently return true or consistently return false, provided no information used in equals comparisons on the object is modified.
  • For any non-null reference value x, x.equals(null) should return false.

Now that’s the small print, but what does it actually mean?

To implement the rules above there are three steps to take:
  1. Compare the parameter object with this in order to optimise the equals for performance. This is especially true if your equals method takes time to execute.
    if(this == obj)
        return true;
    
  2. Compare the type of object:
    if((obj == null) || (obj.getClass() != this.getClass()))
        return false;
    
  3. Lastly, compare the contents of the object:
    return num == test.num &&
        (data == test.data || (data != null && data.equals(test.data)));
    
Note that the last line of code performs the deep equals check.

Avoid the use of instanceof when testing the object type.
if(!(obj instanceof Test)) return false; // avoid
You should use obj.getClass() instead, as this ensures that it will return false if the argument is a subclass of the test class whereas the instanceof operator violates the symmetry requirement of the contract as it will return true if the argument is a subclass of your test class. The instanceof check is correct only if your test class is final because no subclasses exist. obj.getClass() will work for both final and non-final classes. Both instanceof and obj.getClass() will return false if the argument is null. The instanceof operator returns false if the left hand side (LHS) operand is null, irrespective of the operand on the right hand side (RHS) as specified by JLS 15.20.2; however, the obj.getClass() should be preferred for better type checking.

Note that you also need to check the 'data' string to avoid a null pointer exception:
(data == test.data || (data != null && data.equals(test.data)))
Putting all this into practise should give you something like:

  @Override
 
public boolean equals(Object obj) {

   
// Are we checking the same object??
   
if (this == obj)
     
return true;
   
// Are we check the same type of object??
   
if ((obj == null) || (obj.getClass() != this.getClass()))
     
return false;
   
// object must be EqualsHash tyep at this point
    // so safely cast and do the deep comparison.
    // Note the data == test.data and data.equals(test.data)
   
EqualsHash test = (EqualsHash) obj;
   
return num == test.num
        &&
(data == test.data || (data != null && data.equals(test.data)));
 
}

hashCode() Implementation


When implementing a hash code, the following rule must be obeyed: "Equal objects must produce the same hash code as long as they are equal, however unequal objects need not produce distinct hash codes".

Writing a very good implementation of the hashCode() method which calculates hash code values such that the distribution is uniform is not a trivial task and may require inputs from mathematicians and theoretical computer scientists. Nevertheless, it is possible to write a decent and correct implementation by following few simple guidelines.

1. Store an arbitrary non-zero constant integer value (say 7) in an int variable, called hash.
2. Involve the significant instance variables of your object in the calculation of the hash code. All the variables that are part of equals() comparison must be considered for this. Compute an individual hash code, (say: hashCode) for each instance variable as follows:
  • If the variable var is a byte, char, short or int, then its hash code = (int)var;
  • If the variable var is a long, then its hash code = (int)(var ^ (var >>> 32));
  • If the variable var is a float, then its hash code = Float.floatToIntBits(var);
  • If the variable var is a double, then its hash code =
    long bits = Double.doubleToLongBits(var);
    hash_code = (int)(bits ^ (bits >>> 32));
    
  • If the variable var is boolean, then its hash code = var ? 1 : 0;
  • If the variable var is an object reference, then check if it is null, if yes
    then its hash code = 0; otherwise invoke the hashCode method recursively on this object
    reference to get its hash code. This can be simplified and given as:
    hash_code = (null == var ? 0 : var.hashCode());
    
3. Combine this individual variable hash codes above with original hash code hash as follows: hash = 31 * hash + hash_code;
4.Repeat these steps for all your object’s significant instance variables and return the resulting integer hash.
5. Lastly, review your hashCode() method and check if it is returning equal hash codes for equal objects and also verify that the hash codes returned for the object are equal for multiple invocations during the same execution.

So, for hash code generation, it's a matter of pick a number, any number so long as you obey the rules.

  @Override
 
public int hashCode() {
   
int hash = 7;
    hash =
31 * hash + num;
    hash =
31 * hash + (null == data ? 0 : data.hashCode());
   
return hash;
 
}

The guidelines provided here for implementing equals() and hashCode() methods are merely useful as guidelines, these are not absolute laws or rules. Nevertheless, following them while implementing these two methods will give you correct and consistent results.

No comments: