Archive for November 18th, 2009

November 18th, 2009

Caching at Scale, Architecture Reviews, and Hadoop — QConSF 2009 Impressions

by Tim Cull

I’m at QConSF all this week, so you’ll get to hear my impressions of every session I go to. Lucky you!

Today is the start of the shorter sessions, so you get a three-for-one deal.

1) Caching at Scale with Alex Miller
Miller works for Terracotta and so most of what he concentrated on was EHCache and Terracotta. Much of the session had to do with configuring and using each of those tools, but I did get a couple of good reminders about what’s good to cache and what isn’t. Specifically, before caching something, make sure it has good “locality” (i.e. the same piece of data tends to be asked for in clumpy bursts of time) and a good distribution (i.e. the majority of people ask of a small subset of the total data universe).

2) Lessons Learned from Architecture Reviews with Rebecca Wirfs-Brock
Wirfs-Brock opened with two slides showing two different ideas of “collaborative.” In one, all the stakeholders and reviewers of an architecture gather together in harmony and all are shooting for the same goal for the common good. In the other, they only collaborate in the sense that the conquered collaborate with their occupying army. It’s important to know which kind of situation you’re in before picking your toolset to deal with it. I was a little shaken when she showed a slide of my boss’ book and said it was an example of a toolset to use in the occupying army kind of collaboration. What does that say about my day job?!

My best takeaway from the talk is that it’s useful to clearly organize your architectural feedback into buckets:
1) Recommendations — we really think you need to do these and not doing them would be a mistake
2) Suggestions — if you do these I predict they will make you happy, but you won’t miss them if you don’t do them
3) Observations — a place to put statements about perceived problems that aren’t really problems, or point out good choices that should be kept

3) Hadoop with Philip Ziegler
Hadoop is a system for running massive calculations over massive amounts of data. Ziegler took us through an overview of it that was good and engaging, but really not much different from what you can get reading the web site.

November 18th, 2009

Domain Specific Languages with Ola Bini and Martin Fowler — QConSF 2009 Impressions

by Tim Cull

I’m at QConSF all this week, so you’ll get to hear my impressions of every session I go to. Lucky you!

In college, I never took the “compilers” class that most of my other classmates took. For the first five years of my career I felt smug and superior for not having wasted the time, but I’ve spent every year since then regretting the decision.

Domain Specific Languages (or DSLs) are not a new concept (how long has “make” been around?). But they have been catching a lot of renewed attention lately. We are seeing a convergence of enabling technologies that make creating them easier: dynamic languages like ruby and python and easy-to-use parsers like antlr. This ease is important because, by definition, a DSL has to be specific to a domain and therefore you can’t spread the cost of its creation over very many projects.

Nowadays I thoroughly regret not having taken the compilers class, but Bini and Fowler’s day-long session yesterday helped fill some of the gap. The session was very technical and (frankly) very dry. They talked a great deal about parsers, parse trees and symantic models. But the density was warranted because you can’t really understand the full power of a DSL until you know those concepts. Fowler and Bini gave us just enough background not to hang ourselves with our shiny new ropes.

One point they hammered again and again was that you need to keep your syntax, your semantic model, and your execution separate. For example, if you were to write a new kind of Spring configuration file format in JSON, then JSON would be your syntax, the Spring BeanDefinition interface would be your semantic model, and the Spring GenericApplicationContext might be your executing code. Many implementations of DSLs might be tempted to leap directly from parsing the input to calling code on the fly but according to the presenters that usually leads to heartache as your DSL becomes more complex.

They also went into detail about the difference between an External DSL (something you have to write a parser for, like Ant) and an Internal DSL (basically helper functions on top of an existing language, like Rake).

November 18th, 2009

QCon 2009 Impressions: Java Performance with Kirk Pepperdine

by Tim Cull

I’m at QConSF all this week, so you’ll get to hear my impressions of every session I go to. Lucky you!

My first all-day tutorial at QCon 2009 was “Java Performance Tuning” with Kirk Pepperdine. He spent much of the day encouraging us to spend time classifying the nature of the performance problem before trying to actually solve it.

Specifically, he asserted that you can diagnose much of an application’s problem without knowing anything about its source (or even ever having seen its source). His methodology basically boils down to this: First, use monitoring tools to classify your problem:
1) Is it user cpu bound, io bound, system cpu bound or memory bound?
2) Is it Application layer, JVM layer, or OS layer?

Then, and only then, use a profiler plus your knowledge of the source to pin the problem down and solve it.

Basically, most of the morning was about codifying what’s really common sense (but what isn’t at the end of the day?). In particular, you should measure and monitor, then make hypotheses based on monitoring, then use profiling to pinpoint the problem and support your hypotheses. Fix the problem and re-sample your monitoring. Repeat until the user is happy.

I’ve seen this many times on projects that have performance problems. Each developer has his own pet theory based on a hunch and will not let go. If more projects started with measurements and testable hypotheses, they’d be in a much better place.

Pepperdine also introduced several tools, the best of which was one I’ve been searching for but couldn’t find: IronEye SQL. He confirmed what I’d suspected–that the project was dead. But I was happy to discover that he personally had the source and was trying to revive it! Yay!

The best technical tidbit I got out of the session was some insight about Collections classes that have a medium-term life. It’s possible to have a Collections class (say a LinkedList) live just long enough for some of its plumbing to make it into the “old” generation. If that happens, then even if that LinkedList goes out of scope and is eligible for garbage collection, it won’t actually get collected until the next FullGC. Partial GCs only get the “young” generations. That’s not necessarily the end of the world until you consider that many of the objects contained in the LinkedList might have been very short-lived and might be in the young generation heap. But they can’t be garbage collected because the LinkedList still refers to them. These are known as ‘zombie’ objects–objects that aren’t referred to any more, but never-the-less won’t get collected in a Partial GC.