Archive for July, 2009

July 9th, 2009

Another Fast Way to Kill Performance: Over Logging

by Tim Cull

I’ve seen the problem I’m about to describe “in the wild” several times now, almost as much as I’ve seen the SimpleDateFormat problem.

Put simply, there is no such thing as a free lunch. If you log every little thing your application is doing in production then you will have performance problems, period. If you find yourself saying “But I need all that logging, how else am I going to know where I am and recover when my app crashes in production?” then you also have to ask yourself “How many times have I crashed in production because I’m doing too much logging?”

I have seen two applications now–serious applications that handle serious money–trying to log information at rates up to ten screens full of SQL and “I am here now” kind of stuff per second. For many seconds back-to-back. We’re talking 100MB of logging information in less than a minute.

Now this doesn’t cause performance problems in the straight-forward way you would think it does thanks mostly to the wizardry of the log4j folks. Fortunately for Java developers everywhere, they wrote a nice, bullet-proof logging package that can swallow those logs whole seemingly instantaneously. They do this by allowing you to chain together an asynchronous logger and any other logger like a file logger, a console logger, or even a socket logger (or even all three, which I’ve also seen “in the wild”).

The performance problem comes around in a more indirect way. Usually when I get to groups with this kind of problem they’ve been running for six months or a year with a giant heap (i.e. 2GB to 5GB) and are wondering why their application pauses so often (especially under load) for up to 60 seconds to do full garbage collection runs. And they often suspect they have a “memory leak” because those garbage collection runs fail to recover much memory. Usually in these cases I find all kinds of stuff, but the biggest and first find is usually a whopping application log.

Pause for a minute to ponder why it is an application can log 100’s of megs of log messages without slowing down. Remember again that nifty log4j asynchronous logger that stands between your code and the actual file system. What’s happening in there is that log4j is taking in your log message, putting it in an in-memory buffer as fast as possible and returning control to your application thread almost immediately. Then, a separate thread (or threads) is in charge of draining that in-memory buffer to disk and in an ideal, fairyland world the maximum speed you can write to a disk is ~150MB/sec. In reality land it might be one or two orders of magnitude slower (especially if your 5GB heap is thrashing virtual pages around on disk). All this is fine and dandy as long as your application doesn’t reach more than 80% or so of its heap. When it does, Java starts to try to garbage collect. But because all those log messages waiting to be written to disk are still referenced by that in-memory buffer, it can’t collect them. The application (including, critically, trying to write out to disk) slows down. The logging back log continues to pile on. Your application enters a death spiral where it’s maxed out its heap and can’t claim more. If you look at your heap usage in something like jconsole, you see a tight squiggly line right at your max memory setting that stubbornly refuses to come down.

Right about then the application team decides it’s got a memory leak. An honest-to-goodness memory leak is very rare in the Java world and often it has to do with really hairy classloader circular references and other nasty buggies. I have only ever seen one, real Java memory leak in my career and funnily enough it had to do with a home-grown (not log4j) logging framework that just refused to let go of logging messages.

Luckily for the application team, they probably don’t have a real memory leak. What they do have is a process that’s feeding new information into one end of a pipe faster than it can drain it out the back end. When looking at heap usage and garbage collection from jconsole, both this “bulging pipe” problem and a real memory leak look the same: memory goes up, garbage collector runs, memory stubbornly refuses to go down, eventually you get the dreaded OutOfMemoryError. But, importantly, the fix to this bulging pipe problem is much simpler: just remove some logging.

And don’t even get me started on the headache you’ll have if you’re using the “+” operator to concatenate Strings for those log messages. Or if you have both a file logger and a console logger, but your startup script does something like this: “java myApp >> stdout.log”