Tuesday, 8 February 2011

What do all the different HashMaps do?

It seems strange to me that the Java API has been allowed to grow to the point where there’s so much overlapping functionality. If, as a Java API developer, you were allowed to ‘refactor mercilessly’ then there would be whole swathes of classes that would suddenly find themselves deprecated.

A case in point is the fact that the API contains at least seven map implementations: HashMap, LinkedHashMap, WeakHashMap, IdentityHashMap, ConcurrentHashMap, Hashtable, and ConcurrentSkipListMap and that’s without mentioning the map types returned by the Collections static class or the concurrent implementations added in Java 6. All these maps are slightly different; some are thread safe, others are not; some have constant time access, some are list based and some accept null value entries whilst some don’t.

Having had the good fortune to spend a great deal of time refactoring other peoples code you find that 99.9% of developers only ever use the HashMap implementation irrespective of how their code works. I suspect that the questions: “Do I need a thread safe map?” or “Do I need time constant performance irrespective of size?” never gets asked. If this is the case, it begs the question: “why not deprecate a few map implementations?”, I know I would...

One of the goals of the collections interfaces (List, Set, Map etc.) was that their implementation classes should be interchangeable and in some respects the Java API has failed here. The Map implementation guidelines don’t specify whether or not a Map can or cannot contain null values. Some implementation allow this some don’t. Although minor, this is a problem if you have to suddenly change your map type in mid-development; for example you realise that you need a thread-safe map and switch from HashMap to ConcurrentHashMap. You could find that your threading issue disappears only to be replaced by NullPointerExceptions.

In an effort to clarify which map implementations do what, I’ve compared several below...

HashMap

A general purpose map where the contents are not ordered. It has constant time performance, i.e. all values are retrieved in the same time irrespective of position in the map and size of the map. It accepts null values. It is not thread-safe and doesn’t allow concurrent access.

Use either as a simple stack base variable. Don’t use as a class instance variable, prefer ConcurrentHashMap unless you are certain your application is single threaded. Remember that JEE container apps are multi-threaded, a fact often forgotten leading to various problems.

LinkedHashMap

A linked list implementation of a map, where every thing's stored in the order in which it was inserted into the map. It accepts null values. It is not thread-safe and doesn’t allow concurrent access.

Use this in the same way as a HashMap, when you also need to be able to iterate through its contents knowing their insertion order irrespective of key value.

WeakHashMap

I’ve mentioned this before. It holds weak references to its contents allowing memory to be garbage collected when no other reference to an entry exists. It accepts null values.

Use this implementation when you need the kind of global map that exists for the whole lifetime of your running program. In that way, you’ll avoid memory leaks.

IdentityHashMap

Built for extra speed (out performs HashMap). This class implements the Map interface with a hash table, using reference-equality in place of object-equality when comparing keys and values. In other words, in an IdentityHashMap, two keys k1 and k2 are considered equal if and only if (k1==k2). In normal Map implementations (like HashMap) two keys k1 and k2 are considered equal if and only if (k1==null ? k2==null : k1.equals(k2)).

This is not threadsafe and accepts null values.

Do you ever need to use this? Is speed such an issue with today’s JVMs and processors? Do those extra couple of nano-seconds matter?

ConcurrentHashMap

This is a map implementation that provides the same functionality as HashMap with the benefit of it being thread-safe and fully concurrent. It’s a direct replacement for the synchronized Hashtable map implementation and according to Brian Goetz book Java Concurrency in Practice, it’s probably quicker, and can offer “dramatic scalability improvements with little risk”. It does not accept null values.

Hashtable

A synchronized threadsafe general purpose map that does not accept null values. Don’t use this, use ConcurrentHashMap instead it's much quicker.

ConcurrentSkipListMap

A sorted list implementation of a map, where every thing's stored in the order in which it was inserted. It does not accept null values. It is thread-safe, fully concurrent hash map.

Final Recommendations

I’ve personally never needed to use the linked / sorted map implementations; maybe my mind doesn’t work in that way and I’ve missed out somewhere, or maybe they’re not that useful and, keeping to the KISS paradigm, not really needed. As for the other implementations, the contrived code below makes a few rule of thumb recommendations.

public class MapGuideLines {

 
/** Always use the concurrent hash map as an instance variable */
 
private Map<String,String> instanceMap = new ConcurrentHashMap<String,String>();
 
 
/** Although not threadsafe, use this as a global map to avoid memory leaks */
 
private static Map<String,String> globalMap = new WeakHashMap<String,String>();
 
 
public void myMethod(String key,String value) {
   
   
// This is thread safe as it's on the stack
   
Map<String,String> stackMap = new HashMap<String,String>();
   
   
// This is problematic, as stackMap can contain null values
    // and may result in a NullPointerException
   
instanceMap.putAll(stackMap);
   
   
// This will work - both WeakHashMap and HashMap accept null values
   
globalMap.putAll(stackMap);
 
}
}

No comments: