Archive for June, 2008

June 27th, 2008

ResultSet: Watch the Scroll Type

by Tim Cull

It turns out that using ResultSet.TYPE_SCROLL_INSENSITIVE or ResultSet.TYPE_FORWARD_ONLY on a java.sql.ResultSet can potentially give you a vastly different memory footprint. For example, we discovered today that a process pulling 42,000 rows out of a database and converting them into objects might take 70MB to do its job, or 800MB to do its job, depending on which type you use (at least with a Sybase jconn2 JDBC driver).

That get your interest? Read on to see my observations and my wild-ass guess at why.

Again profiling with JProfiler (really, you should not be doing performance analysis without a profiler or you will never find these kinds of things) I discovered that, in particular, ResultSet.getInt() using jconn2 is an incredible memory pig. It uses 7 times as much memory in temporary objects as calls to getDate(), getString() and getDouble(). I’m talking about objects that is uses in the process of building the “int” it returns to you, not the actual “int” (which is clearly very small). So, why do you care? If you’re using TYPE_FORWARD_ONLY you don’t have to care much unless you’re just trying to get the garbage collector to run less often. But if you’re using TYPE_SCROLL_INSENSITIVE (I can’t speak for TYPE_SCROLL_SENSITIVE because I didn’t test it) then you should care. A lot.

The reason is that the ability to scroll backwards and forwards with your result set doesn’t come free. It requires the result set to hold onto a lot more data internally instead of letting it go right way. So, consequently, with a forward only result set we were seeing the memory released every time we moved from one row to another, but with the scrolling result set it was holding onto all the memory until the very bitter end, when the result set was finally closed. For the application in question, that meant on our highest volume day of the year we simply couldn’t start. Even though the data we were trying to cache on startup only totalled 70MB or so in the end, it required (and held onto until the result set was closed) a whopping 800MB of memory to build that 70MB of data. That 800MB, when combined with the other data already in memory, meant we exceeded the 1.3GB limit for a JVM running on 32-bit Windows. We were stuck.

Credit for this last find goes to someone I work with named Shibu; without him I’d probably still be at work trying to figure out where those extra 730MB were coming from.

June 20th, 2008

hashCode(): The Easiest Way to Kill Application Performance

by Tim Cull

There’s no faster way to kill your application’s performance than by implementing an inefficient hashCode() function in some low-level, commonly used class. As with SimpleDateFormat problems, I’ve seen this hashCode() problem twice now “in the wild” and the results aren’t pretty. Both times, the problem could only be diagnosed by using a profiler (JProfiler in both cases), which bolsters even further the argument that you’re wasting your time trying to do performance improvement if you’re not using a profiler.

Why does hashCode() matter so much, you ask? Think about what happens every time your application puts an object into a java.util.Collection class, especially (wait for it….) Hashtable and HashMap. Notice something common there? Yes, that’s it, the word “hash”, as in “hash function”, which means that each of those collection classes will call your humble little hashCode() function millions of times. So if your hashCode() function is doing anything more exotic than returning
an int it had already pre-calculated in the constructor then you’ve got problems.

The first time I saw this problem in the wild was an application that was displaying domain objects in a Swing (more precisely, JIDE) table by storing the domain objects directly in a table model. The table model was both sortable and filterable, so the sorting and filtering algorithms ended up calling hashCode methods several million times over the course of a minute, especialy when the user was scrolling back and forth. Ultimately, the problem came down to an Identifier class that
was often used as a key:

public class Identifier {
       private Object type;
       private Object value;

       public Identifier(type, value){
               this.type = type;
               this.value = value;
       }

       ...some stuff, including equals()....

       public int hashCode() {
               return type.hashCode() ^ value.hashCode();
       }
}

The application wasn’t very responsive, so we profiled it and discovered that this implementation took something crazy like 80% of the time used by the Swing event thread simply calling hashCode().

The second time we had an application was taking forever and a day to start up and prime its caches of data. One of many culprits turned out to be a custom class to implement the concept of a compound key:

public class Key {
       private Map members;

       public Key(Map members){
               this.members = members;
       }

       ...some stuff, including equals()...

       public int hashCode() {
               Iterator i = members.entrySet().iterator();
               List sortedSet = new ArrayList();
               while (i.hasNext()){

sortedSet.add(i.next().value().toString().trim());
               }

               // sort is because members.entrySet() has no guaranteed
order
               Collections.sort(sortedSet);

               Iterator j = sortedSet.iterator();
               String hashStr = "";
               while (j.hasNext()){
                       hashStr += j.next();
               }
               return hashStr.hashCode();
       }
}

This code is just downright awful, performance or no performance, though in its defense I will say it was written many years ago by a team of guys who were both junior and new to Java from VB6. Profiling the application revealed that it was spending 70% of its cache loading time simply calling Key.hashCode()–and far more time calling hashCode() than it even spent pulling the data it was caching out of the database.

Both these cases had two things in common: 1) the classes were very low level, ubiquitous, and probably written early in the project, and 2) they both ended up being a key in a Map. Both of them also prove a mantra I’m developing: if you’re trying to do performance tuning without the hard data you get from a profiler, then you are wasting your time. Without a profiler I would never have thought to look at them.