Archive for June 20th, 2008

June 20th, 2008

hashCode(): The Easiest Way to Kill Application Performance

by Tim Cull

There’s no faster way to kill your application’s performance than by implementing an inefficient hashCode() function in some low-level, commonly used class. As with SimpleDateFormat problems, I’ve seen this hashCode() problem twice now “in the wild” and the results aren’t pretty. Both times, the problem could only be diagnosed by using a profiler (JProfiler in both cases), which bolsters even further the argument that you’re wasting your time trying to do performance improvement if you’re not using a profiler.

Why does hashCode() matter so much, you ask? Think about what happens every time your application puts an object into a java.util.Collection class, especially (wait for it….) Hashtable and HashMap. Notice something common there? Yes, that’s it, the word “hash”, as in “hash function”, which means that each of those collection classes will call your humble little hashCode() function millions of times. So if your hashCode() function is doing anything more exotic than returning
an int it had already pre-calculated in the constructor then you’ve got problems.

The first time I saw this problem in the wild was an application that was displaying domain objects in a Swing (more precisely, JIDE) table by storing the domain objects directly in a table model. The table model was both sortable and filterable, so the sorting and filtering algorithms ended up calling hashCode methods several million times over the course of a minute, especialy when the user was scrolling back and forth. Ultimately, the problem came down to an Identifier class that
was often used as a key:

public class Identifier {
       private Object type;
       private Object value;

       public Identifier(type, value){
               this.type = type;
               this.value = value;
       }

       ...some stuff, including equals()....

       public int hashCode() {
               return type.hashCode() ^ value.hashCode();
       }
}

The application wasn’t very responsive, so we profiled it and discovered that this implementation took something crazy like 80% of the time used by the Swing event thread simply calling hashCode().

The second time we had an application was taking forever and a day to start up and prime its caches of data. One of many culprits turned out to be a custom class to implement the concept of a compound key:

public class Key {
       private Map members;

       public Key(Map members){
               this.members = members;
       }

       ...some stuff, including equals()...

       public int hashCode() {
               Iterator i = members.entrySet().iterator();
               List sortedSet = new ArrayList();
               while (i.hasNext()){

sortedSet.add(i.next().value().toString().trim());
               }

               // sort is because members.entrySet() has no guaranteed
order
               Collections.sort(sortedSet);

               Iterator j = sortedSet.iterator();
               String hashStr = "";
               while (j.hasNext()){
                       hashStr += j.next();
               }
               return hashStr.hashCode();
       }
}

This code is just downright awful, performance or no performance, though in its defense I will say it was written many years ago by a team of guys who were both junior and new to Java from VB6. Profiling the application revealed that it was spending 70% of its cache loading time simply calling Key.hashCode()–and far more time calling hashCode() than it even spent pulling the data it was caching out of the database.

Both these cases had two things in common: 1) the classes were very low level, ubiquitous, and probably written early in the project, and 2) they both ended up being a key in a Map. Both of them also prove a mantra I’m developing: if you’re trying to do performance tuning without the hard data you get from a profiler, then you are wasting your time. Without a profiler I would never have thought to look at them.