Archive for ‘Technology’

September 11th, 2010

It’s Official: I’m an InfoQ Editor Now

by Tim Cull

I’m pleased to announce I’ve got a side gig, writing the latest and greatest Java news over at InfoQ. It all started with my “8 tips for legacy Java code” article, after which they asked me to write regularly for them: one quick news post a week.

So, if you’ve got some Java news you think the rest of the world will be interested in, please let me know. Introductions to people I can interview are even better!

September 10th, 2010

Quick Props for a Fantastic User Experience: Xdebug

by Tim Cull

I just had one of the best user experiences ever trying to install Xdebug for PHP.

Get this: you paste information about your installation using the extremely common “phpinfo()” function and the website parses the free-text output and tells you:

  1. Exactly what to download and from where
  2. Exactly what to put in which configuration file with paths tailored to your setup and, here’s what kills me…
  3. Exactly where that config file is on your file system

Now that’s looking at things from your user’s point of view.

September 9th, 2010

Overdue: Apple Relaxes Restrictions on Developer Tools

by Tim Cull

Apple has finally removed the self-destructive restrictions they put on developer tools. I couldn’t believe they’d do that in the first place and it’s good to see they saw the light. The question is: is it too late? I for one vowed to choose Android over iPhone when I had the option and lifting the restrictions now hasn’t really changed that inclination.

September 1st, 2010

The little known and underappreciated HTML Header element

by Tim Cull

I happened across an HTML element I didn’t know existed today. The ‘header‘, which is not to be confused with the ‘head‘ element, is a section of the HTML body that can contain, you guessed it, header information like a table of contents or some titles. It’s been supported by browsers for ages, including IE6.

I just figured I’d post something about this under-appreciated element here to boost its Google recognition a smidgen and separate it from its better-known, more-popular-at-parties cousin: the ‘head’.

August 6th, 2010

The Hyperpolyglot Cheat Sheet

by Tim Cull

I saw a good resource and a cool meme today: “hyperpolyglot”. That’s where I found some good side-by-side cheat sheets for scripting languages. As the days go on, I find myself working in many different languages simultaneously and it sure is nice to have a reminder around to look up the maddeningly different syntax for each.

August 5th, 2010

How To Include Other Velocity Templates From Apache Camel

by Tim Cull

I was surprised to discover today that, out of the box, it’s difficult to include one Velocity template from another if you are using Apache Camel. For once, a search of the Internet didn’t easily find the solution.

So, for the benefit of others, here is the solution….
-

July 31st, 2010

New InfoQ Article: Eight Quick Ways to Refresh Legacy Java Systems

by Tim Cull

I’m happy to announce a new article of mine on InfoQ: Eight Quick Ways To Improve Java Legacy Systems. In this article I explore different, easy ways to improve your legacy Java system.

Also, stay tuned for more news about me and InfoQ…

June 24th, 2010

How to avoid huge transactions with CMP Entity Beans on JBoss

by Tim Cull

By default, CMP Entity Beans on JBoss are set to require a transaction. Also by default, any time you touch any session or entity bean, your request thread takes out a lock on that entire object, even if you are only reading it and not updating it. Lastly, also by default, JBoss will make sure that for any given entity, there is only one instance of that entity in memory at a time.

All of these defaults have serious implications. For one, it implies that anything other than a toy application will likely become a de-facto, single-threaded application. Imagine, for example, that you have an earthquake tracking application. Your application might have an Entity Bean called Earthquake. After getting under way with the application, you realize there are different kinds of earthquake: tectonic, volcanic, and man-made. These don’t merit having a full-on Earthquake subclass of their own, but maybe you want to model the types as a new Entity called EarthquakeType so that the application can be data-driven and new types can be added later without changing code. The vast majority (~90%) of earthquakes are tectonic, so most of what you ever display to a user will be “tectonic”.

So, you might have a web page that displays the last 40 earthquakes in descending chronological order in a table and also a count of how many different types. This could lead to innocent code like, say:

foreach (Earthquake earthquake : earthquakes){
 typeSum[earthquake.getType().getId()]++;
}

The moment you call earthquake.getType() for the first earthquake in the list, you will lock the “tectonic” instance of the EarthquakeType Entity bean. This means that every other thread executing in the same JVM (if configured the default JBoss way) will most likely block (who doesn’t need to know what the earthquake type is, after all?) until this thread is done displaying its page. Even worse, if this thread is holding a lock that some other thread needs, and that other thread is holding a lock that this thread needs, then you have a deadlock. All of this in spite of the fact that actually updating an EarthquakeType is extremely rare because they are read-mostly.

A telltale sign that you are having this problem is seeing stack traces like this one:

org.jboss.util.deadlock.ApplicationDeadlockException: Application deadlock detected, resource=org.jboss.ejb.plugins.lock.QueuedPessimisticEJBLock@290df5c3, bean=

…snip…

at org.jboss.util.deadlock.DeadlockDetector.deadlockDetection(DeadlockDetector.java:69)
at org.jboss.ejb.plugins.lock.QueuedPessimisticEJBLock.waitForTx(QueuedPessimisticEJBLock.java:292)
at org.jboss.ejb.plugins.lock.QueuedPessimisticEJBLock.doSchedule(QueuedPessimisticEJBLock.java:230)

…snip.

At first, it’s tempting to fume at JBoss for having such conservative default settings. I know I did this morning as I was learning more about the details. But the fact is that they really have no choice. The application container has no idea that EarthquakeType is read-mostly. It doesn’t know if you will read it at the beginning of the request and then modify it 300 milliseconds later at the end of the request. So, it is forced to loop absolutely everything you touch into a giant transaction unless you tell it otherwise.

Now, the “telling it otherwise” is where things start to get tricky. Here, I really do think that JBoss hasn’t done us any favors. It’s a multi-step process to making sure you maximize your throughput and minimize deadlocks. If you do some steps but don’t do others, then nothing will change and you won’t know why.

So, here are the steps…

June 17th, 2010

Performance Anti-Pattern: Pre-Loading Caches on Startup

by Tim Cull

In-memory application caches are a fantastic way to improve application performance. In some cases, they can be done quickly and cheaply with stunning performance improvements. But in-memory caches can also kill application performance in myriad ways. For this post I’ll focus on only one: pre-loading caches on application start up.

I have worked on half a dozen different applications that pre-load caches on start up. In every one of those cases, the application server container would start up and then there would be some kind of hacky, custom-built startup class that held back client requests until some other custom code queried various data sources and filled in-memory caches with data. Only when all caches have been filled are client requests allowed to proceed. Some of these applications had huge caches–well into the gigabyte range–and took ages to warm up.

This. Is. Bad.

It’s bad for the obvious reason that it takes longer for the application server to start up and be serviceable, thereby increasing the length of development and testing cycles. Across an entire project’s lifetime, the time spent with team members waiting for caches to pre-load on startup alone can add up to hundreds of thousands of dollars.

But the obvious reason isn’t even the worst reason. The worst reason pre-loading on startup is bad is that it leads developers to assume that the cache will always be populated and that it will always be populated with the entire dataset. This assumption is a disaster for your application because:

  • Then logic to detect cache misses and fetch individual items on a miss never gets written or is written badly, and as a consequence…
  • Nothing can ever be evicted from the cache, which means…
  • Operational teams have no options when faced with memory or data consistency issues in production, and…
  • Keeping cache data synchronized with external “golden masters” is more difficult because you have to actually figure out the differences and re-query the golden source each time there’s a change instead of just invalidating the cache entry and lazily re-fetching it later, plus…
  • Buggy code tends to be written that assumes literally the same object is in memory at all times and anything can be attached to it, including objects that are in other caches, which leads to…
  • A tangled mess of cache inter-dependencies, which is very difficult to untangle a few years down the road when the team realizes cache pre-loading on startup is a bad idea and tries to do something about it.

“You’re just being alarmist, Tim. Those things never really happen because application teams know better than to get into that situation,” you say. Balderdash. Every, single, pre-loading application I’ve ever seen eventually falls into exactly the same scenario I just described. This is because of the Possibility Law: anything that is possible in your application’s code will eventually be done by some junior programmer somewhere. If you pre-load caches on startup, then you’ve made it possible for someone to assume caches are always full and always contain the entire data set. If you don’t pre-load, then you haven’t made that ubiquitous-data assumption possible.

But there is one draw-back to not pre-loading caches: when the application is cold and the cache is empty, warming the cache by grabbing data from the data store on a onesie-twosie basis is fantastically inefficient. I have a solution for this problem….

June 9th, 2010

Law of Preservation of Complexity

by Tim Cull

We developers and technologists are obsessed with abstractions and generalizations. Tell any developer they will be implementing a series of business rules for an insurance company, and they’ll roll their eyes. Tell the same developer they will be implementing a generic rules framework that lets business users specify any rule they want without any coding and they will start to salivate.

This obsession with abstraction and generalization is usually a good thing. It drives us to see patterns that others don’t see and invent new ways of thinking that drive the productivity improvements that keep our economy booming.

But there is a limit to it how useful generalization can be. That limit comes when it slams up against what I’ll call the Law of Preservation of Complexity. My Law of Preservation of Complexity says: “There is only so much of a problem that can be generalized away. Once the theoretical maximum generalization has been achieved, what remains is the fundamental problem that must be solved through brute force. Any attempt to further generalize will actually be adding complexity or obscuring the problem.”

Imagine the case of squeezing a semi-inflated water balloon. The water inside is the fundamental complexity of a problem and the shape of the balloon is how you’ve chosen to generalize the problem. If you squeeze one end of the balloon to make it smaller, then the other end of the balloon will get bigger. If you poke the other end with your finger, then the sides of the balloon will get bigger. No matter what you do to the balloon, it will still hold the same amount of water.

Now imagine this concept in a system. Say someone designed a system with data storage that’s nothing more than one giant table of name-value pairs. “This is fantastic!” the architect will say, “we can change the data model without having to change the database!” True and not true. The fact is there’s a fundamental complexity (relationships between entities that could have been represented with tables and foreign keys) that has simply been pushed up into the application layer. The complexity did not go away, it was simply moved somewhere else, possibly into a technology that’s less adept at handling it. In the worst case scenario, the complexity was moved to more than one place; it was shot-gunned across the system at random instead of compartmentalized in one, logical place.

Whenever I have a team proposing an abstraction or a generalization to me, I ask, “is this really removing complexity all together, or is it simply pushing it around, like a toddler would do with lima beans he doesn’t want to eat on his plate?” Only if the proposal passes that test do I recommend proceeding.

So if some complexity is inevitable, then what should we do with it? I recommend compartmentalizing it in one place. Put it behind an iron-clad facade with a well defined interface. Behind the interface, be as ugly and brute-force as necessary, but beyond the interface keep the system elegant and generalized. Then at least the complexity is obvious, discoverable, and easier to maintain.