Archive for September, 2009

September 8th, 2009

Using a Profiler — Calling For Backup

by Tim Cull

This first in my Calling For Backup series is a quick argument about why the very first thing anyone should do when tackling a performance problem is attach a profiler.

The Scene

You have a slow app. The team is throwing around wild ideas how to fix it. Maybe you’ve even tried a few and in spite of being expensive to implement they still didn’t make much difference. Your user community is jaded, upper management is hounding you, and you can’t sleep. A few of the more talented engineers on your team have tried to use a profiler and either couldn’t get the money to buy one or had issues trying to get it to work.

The Real Problem

You’re in guess-and-firefight mode. You need to take a step back and invest in attaching a profiler even though your first shallow pass (if any) was unsuccessful. Really, it will be well worth the investment. Your stakeholders, though, will not give you any time or budget to spend on silly things like profilers.

Why It’s Hard to Explain

By definition you can’t predict to your stakeholders ahead of time what a profiler will find or how big its impact will be. There’s a small chance it won’t find anything and you’ll look like an idiot.

The Rationale

It only takes one line of bad code to sink an app’s performance. In a 30,000 line application, there’s an excellent chance something is lurking in there.

There are a few free profilers, including the one that ships with JDK 1.6. Even paid profilers cost less than a thousand dollars for one license. A thousand dollars is equivalent to a day and a half of a single developer’s time–something you would easily eat with a single hour-long meeting with a team of ten brainstorming ideas out of thin air (the only other alternative to using a profiler).

Some Real Stories

I’ve personally profiled three separate applications that had never been profiled before and yielded big problems that had easy fixes. In one example, a bad implementation of hashCode was bringing a GUI to its knees. In a second example, the constructor of SimpleDateFormat was eating up literally half the processing time in an application’s workflow.

In a third example I found that up to 3GB of an application’s 5GB heap was being taken up with an irritating behavior in java.io.ObjectOutputStream. 5GB heap you say? Yes, for years instead of attaching a profiler (which makes this problem stick out like a sore thumb) the team had just been increasing the heap size until they didn’t run out of memory any more. But the problem is that when even the best JVM does a full garbage collection on a 5GB heap it stops the world for 90 seconds at a time.

I’ve also been lucky once where a profiler would have caught the problem sooner. The lucky time, I was working on a trading system that had a nasty memory problem. After just a short while, especially under load, the system would max out its heap and fall over dead. My boss placed a bounty on the problem. When that didn’t work, we had consultants in from the outside and locked them and our two best engineers in a room for two weeks. Even that didn’t work and finally I happened to get lucky one day and follow a strange bit of code to find the culprit, deep inside a hand-rolled logging class: the code was happily putting every single log message into a Vector that was supposed to be drained off on a socket connection. The problem was that we had long since stopped hooking up any socket listeners, so instead all the logging messages were stacking up in the Vector, living forever. Under load, of course, the log messages stacked up faster.

This last example was from the days before I’d discovered profilers. Don’t ask me why the (very expensive by the way) consultant hadn’t hooked up a profiler, but if any of us had just done it in the first place then that Vector would have stuck out like a beacon in the night–a huge, glowing sign saying “there’s your problem right there.”

Guerrilla Work Arounds

It’s possible that you’ve tried using a profiler before and just couldn’t do it. Or maybe your company won’t pay for one. It’s true that sometimes getting these guys to work is a challenge; in the 5GB heap example, the only way I could get the profiler to not choke trying to parse a heap file that big was to run the profiler in the Amazon cloud on a 64-bit Windows server.

So if you can’t get dedicated developer time from your stakeholders willingly, then this is one of the few times I’d suggest just taking it. Either add a little profiler setup to each story card or feature development, or ferret away a developer and say he’s working on “audit issues”. I promise, it will be worth it. Once you’ve improved the performance of your application by one or two orders of magnitude, nobody will mind you having taken the time.

September 8th, 2009

Calling For Backup: A New Series

by Tim Cull

I’m starting a new series on this blog today called “Calling for Backup.”

We’ve all had those conversations, where we’re arguing something that really should be a no-brainer and, yet, the person we’re arguing with is…well…arguing. I find these situations difficult because trying to explain my reasoning is kind of like explaining why I breathe.

ThickHead: “Why do you keep breathing, it looks like so much work!”

Me: “Because if I stop I’ll die.”

ThickHead: “Nahh, I’ve never heard of anyone dying because they decided to stop breathing. Prove it!”

I’ve finally learned that a very effective tool in these situations is a story. The more real, the better. But my biggest problem is that I don’t walk around cataloging stories so I can just whip them out on demand. I doubt you do, either.

So, my series, “Calling for Backup” will be an attempt to catalog stories in a format that makes them easy to reference. So next time, I hope your conversation will be more like this:

ThickHead: “Why do you want me to attach a profiler to our system to find its performance bottlenecks? We haven’t used one in ten years and we’re fine. Besides, I’m absolutely convinced if we move from Weblogic to JBoss everything will be ten times faster.”

You: “If you haven’t attached a profiler in ten years I will bet you money you’ll find juicy, easy to fix problems.”

ThickHead: “Nah, it’s too much work. Jonny tried it once in 2003 and our app server fell over because it was too slow. Plus it will take at least, what, a full day of a developer’s time? Thanks anyway.”

You: “Here’s the thing. Tim Cull found two different problems on two different real systems that manage billions of dollars using a profiler. And they were easy to fix and there’s no way anyone would have found them without a profiler.”

ThickHead: “Oh, I see what you’re saying. Well, ok, you can have a day or so.”

Stay tuned. The first Call for Backup will be about exactly that: why you should use a profiler.