Archive for April, 2008

April 21st, 2008

SimpleDateFormat: Performance Pig

by Tim Cull

Just yesterday I came across this problem “in the wild” for the third time in my career so far: an application with performance problems creating tons of java.text.SimpleDateFormat instances. So, I have to get this out there: creating a new instance of SimpleDateFormat is incredibly expensive and should be minimized. In the case that prompted this post, I was using JProfiler to profile this code that parses a CSV file and discovered that 50% of the time it took to suck in the file and make 55,000 objects out of it was spent solely in the constructor of SimpleDateFormat. It created and then threw away a new one every time it had to parse a date. Whew!

“Great,” you think, “I’ll just create one, static instance, slap it in a field in a DateUtils helper class and life will be good.”

Well, more precisely, life will be good about 97% of the time. A few days after you roll that code into production you’ll discover the second cool fact that’s good to know: SimpleDateFormat is not thread safe. Your code will work just fine most of the time and all of your regression tests will probably pass, but once your system gets under a production load you’ll see the occasional exception.

“Fine,” you think, “I’ll just slap a ’synchronized’ around my use of that one, static instance.”

Ok, fine, you could do that and you’d be more or less ok, but the problem is that you’ve now taken a very common operation (date formatting and parsing) and crammed all of your otherwise-lovely, super-parallel application through a single pipe to get it done.

What would be better is to use a ThreadLocal variable so you can have your cake and eat it, too:

public class DateUtils {

    public static final String MY_STANDARD_DATE_FORMAT = "yyyyMMdd";

    public static java.util.Date parseDate(String dateString) throws ParseException {
        return getFormat().parse(dateString);
    }

    private static ThreadLocal format = new ThreadLocal(){
        protected synchronized Object initialValue() {
            return new java.text.SimpleDateFormat(MY_STANDARD_DATE_FORMAT);
        }
    }

    private static DateFormat getFormat(){
        return (DateFormat) format.get();
    }
}

I hope this code works because I wrote it on the fly and haven’t tried to run it, but you get the point.

April 10th, 2008

Documentation: Necessary Evil

by Tim Cull

I have a generally dim view of documentation. It’s expensive, difficult to do correctly, rarely updated, and even more rarely used. That said, however, I still think some of it is necessary. Those who say “it’s all in the code, just read it” are completely missing the point about what documentation is for.

To help myself make sure any document I write will be useful and kept up to date, I like to think of (useful) documentation serving one of several purposes. Any single document should not attempt to serve two of these purposes at the same time:

  • Give Me Context:
  • Most team leads and architects create this kind of documentation on whiteboards several times a day, but for some reason never commit it to a document. This kind of information rarely changes once it’s decided and will be viewed by many different kinds of people of varying technical aptitude, so it’s worth spending the extra time to connect all the dots, draw pretty diagrams and bring out your best prose. An example of this kind of document is a high-level architecture diagram.

  • Keep a Reference:
  • This should be all, and only, the information I have been told a hundred times but keep forgetting anyway. Assume the reader already has the context or knows where to find it and is here to get information quickly and get out. Should be very “fact-heavy” and structured in a way that’s easy to maintain because it changes often. Wiki pages are a fantastic medium for these. An example of this kind of document is a table of server machine names and the JMX ports their production processes listen on.

  • Tell Me How To Do Something:
  • The person reading this kind of document has a very specific task to achieve and may or may not care about the context. Keep focused on the task and give every last detail, down to the button clicks, urls to type, etc. Assume your reader is a monkey who knows nothing. Litter the document liberally with links to context diagrams and reference pages–that way you can keep the flow clean and the reader moving from one step to the next without getting lost. Wiki pages are also a great medium for this kind of document. A great example of this kind of document is a set of instructions for a new developer on your team to set up his development environment.

  • Dictate the Law:
  • These are the kind of document none of us likes creating and even fewer of us like reading. But unfortunately sometimes you just have to lay down the law about things like coding standards, developer access to production environments, etc. Think of these like a legal document and try to be as absolutely clear and succinct as you can. Also, try to give some background on the reasons for each rule to give users some context for interpreting the spirt of the rule when they encounter an ambiguous situation.

  • Define the API:
  • I almost left this category out, but then decided that API documentation is indeed a unique beast. API documentation should define only public apis that you actually intend people to use and should be generated by something like JavaDoc, with one or two summary pages to point people to the right places to start for different kinds of tasks. Make sure that you spell out the contract you expect people to follow when calling your API.

There are some documents you might notice I’ve left out. For example, I’m not a big beliver in things like comprehensive class diagrams, data dictionaries, and down-to-the-letter technical specs. If you can generate those kinds of artifacts directly from your build, then great, please do it. Otherwise I tend to agree with the “read the code” purists and don’t think their use outweighs their expense. Likewise, I don’t really believe in support “run books” and knowledge bases because every support issue is different. If you know the context then you’ve got 90% of what you need and a searchable ticketing system will take care of the rest.