Archive for ‘Technology’

June 24th, 2010

How to avoid huge transactions with CMP Entity Beans on JBoss

by Tim Cull

By default, CMP Entity Beans on JBoss are set to require a transaction. Also by default, any time you touch any session or entity bean, your request thread takes out a lock on that entire object, even if you are only reading it and not updating it. Lastly, also by default, JBoss will make sure that for any given entity, there is only one instance of that entity in memory at a time.

All of these defaults have serious implications. For one, it implies that anything other than a toy application will likely become a de-facto, single-threaded application. Imagine, for example, that you have an earthquake tracking application. Your application might have an Entity Bean called Earthquake. After getting under way with the application, you realize there are different kinds of earthquake: tectonic, volcanic, and man-made. These don’t merit having a full-on Earthquake subclass of their own, but maybe you want to model the types as a new Entity called EarthquakeType so that the application can be data-driven and new types can be added later without changing code. The vast majority (~90%) of earthquakes are tectonic, so most of what you ever display to a user will be “tectonic”.

So, you might have a web page that displays the last 40 earthquakes in descending chronological order in a table and also a count of how many different types. This could lead to innocent code like, say:

foreach (Earthquake earthquake : earthquakes){
 typeSum[earthquake.getType().getId()]++;
}

The moment you call earthquake.getType() for the first earthquake in the list, you will lock the “tectonic” instance of the EarthquakeType Entity bean. This means that every other thread executing in the same JVM (if configured the default JBoss way) will most likely block (who doesn’t need to know what the earthquake type is, after all?) until this thread is done displaying its page. Even worse, if this thread is holding a lock that some other thread needs, and that other thread is holding a lock that this thread needs, then you have a deadlock. All of this in spite of the fact that actually updating an EarthquakeType is extremely rare because they are read-mostly.

A telltale sign that you are having this problem is seeing stack traces like this one:

org.jboss.util.deadlock.ApplicationDeadlockException: Application deadlock detected, resource=org.jboss.ejb.plugins.lock.QueuedPessimisticEJBLock@290df5c3, bean=

…snip…

at org.jboss.util.deadlock.DeadlockDetector.deadlockDetection(DeadlockDetector.java:69)
at org.jboss.ejb.plugins.lock.QueuedPessimisticEJBLock.waitForTx(QueuedPessimisticEJBLock.java:292)
at org.jboss.ejb.plugins.lock.QueuedPessimisticEJBLock.doSchedule(QueuedPessimisticEJBLock.java:230)

…snip.

At first, it’s tempting to fume at JBoss for having such conservative default settings. I know I did this morning as I was learning more about the details. But the fact is that they really have no choice. The application container has no idea that EarthquakeType is read-mostly. It doesn’t know if you will read it at the beginning of the request and then modify it 300 milliseconds later at the end of the request. So, it is forced to loop absolutely everything you touch into a giant transaction unless you tell it otherwise.

Now, the “telling it otherwise” is where things start to get tricky. Here, I really do think that JBoss hasn’t done us any favors. It’s a multi-step process to making sure you maximize your throughput and minimize deadlocks. If you do some steps but don’t do others, then nothing will change and you won’t know why.

So, here are the steps…

June 17th, 2010

Performance Anti-Pattern: Pre-Loading Caches on Startup

by Tim Cull

In-memory application caches are a fantastic way to improve application performance. In some cases, they can be done quickly and cheaply with stunning performance improvements. But in-memory caches can also kill application performance in myriad ways. For this post I’ll focus on only one: pre-loading caches on application start up.

I have worked on half a dozen different applications that pre-load caches on start up. In every one of those cases, the application server container would start up and then there would be some kind of hacky, custom-built startup class that held back client requests until some other custom code queried various data sources and filled in-memory caches with data. Only when all caches have been filled are client requests allowed to proceed. Some of these applications had huge caches–well into the gigabyte range–and took ages to warm up.

This. Is. Bad.

It’s bad for the obvious reason that it takes longer for the application server to start up and be serviceable, thereby increasing the length of development and testing cycles. Across an entire project’s lifetime, the time spent with team members waiting for caches to pre-load on startup alone can add up to hundreds of thousands of dollars.

But the obvious reason isn’t even the worst reason. The worst reason pre-loading on startup is bad is that it leads developers to assume that the cache will always be populated and that it will always be populated with the entire dataset. This assumption is a disaster for your application because:

  • Then logic to detect cache misses and fetch individual items on a miss never gets written or is written badly, and as a consequence…
  • Nothing can ever be evicted from the cache, which means…
  • Operational teams have no options when faced with memory or data consistency issues in production, and…
  • Keeping cache data synchronized with external “golden masters” is more difficult because you have to actually figure out the differences and re-query the golden source each time there’s a change instead of just invalidating the cache entry and lazily re-fetching it later, plus…
  • Buggy code tends to be written that assumes literally the same object is in memory at all times and anything can be attached to it, including objects that are in other caches, which leads to…
  • A tangled mess of cache inter-dependencies, which is very difficult to untangle a few years down the road when the team realizes cache pre-loading on startup is a bad idea and tries to do something about it.

“You’re just being alarmist, Tim. Those things never really happen because application teams know better than to get into that situation,” you say. Balderdash. Every, single, pre-loading application I’ve ever seen eventually falls into exactly the same scenario I just described. This is because of the Possibility Law: anything that is possible in your application’s code will eventually be done by some junior programmer somewhere. If you pre-load caches on startup, then you’ve made it possible for someone to assume caches are always full and always contain the entire data set. If you don’t pre-load, then you haven’t made that ubiquitous-data assumption possible.

But there is one draw-back to not pre-loading caches: when the application is cold and the cache is empty, warming the cache by grabbing data from the data store on a onesie-twosie basis is fantastically inefficient. I have a solution for this problem….

June 9th, 2010

Law of Preservation of Complexity

by Tim Cull

We developers and technologists are obsessed with abstractions and generalizations. Tell any developer they will be implementing a series of business rules for an insurance company, and they’ll roll their eyes. Tell the same developer they will be implementing a generic rules framework that lets business users specify any rule they want without any coding and they will start to salivate.

This obsession with abstraction and generalization is usually a good thing. It drives us to see patterns that others don’t see and invent new ways of thinking that drive the productivity improvements that keep our economy booming.

But there is a limit to it how useful generalization can be. That limit comes when it slams up against what I’ll call the Law of Preservation of Complexity. My Law of Preservation of Complexity says: “There is only so much of a problem that can be generalized away. Once the theoretical maximum generalization has been achieved, what remains is the fundamental problem that must be solved through brute force. Any attempt to further generalize will actually be adding complexity or obscuring the problem.”

Imagine the case of squeezing a semi-inflated water balloon. The water inside is the fundamental complexity of a problem and the shape of the balloon is how you’ve chosen to generalize the problem. If you squeeze one end of the balloon to make it smaller, then the other end of the balloon will get bigger. If you poke the other end with your finger, then the sides of the balloon will get bigger. No matter what you do to the balloon, it will still hold the same amount of water.

Now imagine this concept in a system. Say someone designed a system with data storage that’s nothing more than one giant table of name-value pairs. “This is fantastic!” the architect will say, “we can change the data model without having to change the database!” True and not true. The fact is there’s a fundamental complexity (relationships between entities that could have been represented with tables and foreign keys) that has simply been pushed up into the application layer. The complexity did not go away, it was simply moved somewhere else, possibly into a technology that’s less adept at handling it. In the worst case scenario, the complexity was moved to more than one place; it was shot-gunned across the system at random instead of compartmentalized in one, logical place.

Whenever I have a team proposing an abstraction or a generalization to me, I ask, “is this really removing complexity all together, or is it simply pushing it around, like a toddler would do with lima beans he doesn’t want to eat on his plate?” Only if the proposal passes that test do I recommend proceeding.

So if some complexity is inevitable, then what should we do with it? I recommend compartmentalizing it in one place. Put it behind an iron-clad facade with a well defined interface. Behind the interface, be as ugly and brute-force as necessary, but beyond the interface keep the system elegant and generalized. Then at least the complexity is obvious, discoverable, and easier to maintain.

May 28th, 2010

Project Smell: Project Fibrillation

by Tim Cull

Maybe it’s my (ancient history) background as an EMT, but I like to use medical metaphors to describe some common themes in software projects. If you liked my post on application guarding you might like this one, too.

Many of us have been on a project that seems to be alive with frenzied activity, but none-the-less isn’t going anywhere. To me, this sounds so much like the medical condition ventricular fibrillation that I’ve decided to dub the project version of it “Project Fibrillation.”

When someone has a “heart attack,” what they are actually experiencing is ventricular fibrillation. Their heart hasn’t stopped beating–quite the contrary their heart is alive with frantic electrical activity. The real problem is that the electrical activity has become uncoordinated such that when some cells in their heart muscle are squeezing with all their might, other cells are simultaneously relaxing. The net result is their heart expends a great deal of energy but doesn’t actually move any blood anywhere, kind of like a three-year-old wildly kicking in a swimming pool when learning to swim. All the frantic heart activity quickly uses up energy in the heart muscle and since blood isn’t moving properly that energy isn’t replaced. This negative feedback cycle quickly leads to cell death.

Sound like a project you’ve worked on?

Projects that are just getting themselves into trouble can enter fibrillation if they are not careful. A fibrillating project is one that spends much more energy on non-productive activities than it spends on productive activities. For example:

May 14th, 2010

GridGain: New Cloud Computing Partnership

by Tim Cull

I am pleased to announce a new partnership between Thedwick, LLC and GridGain, a leading, open-source cloud computing platform. With this new partnership, we now have even greater insight and access into the cloud computing world, which should help our emerging technology and research offering help clients even more than before.

GridGain firmly believes “that grid computing technology is reaching the point where it will make a transition from one-off and expensive solution to ubiquitous technology available for the project of any size and complexity” and we are proud to be a part of making that vision happen.

May 6th, 2010

New open source SQL analysis tool: jdbcGrabber

by Tim Cull

I’m pleased to announce the first alpha release of a SQL tool I’ve been using to reverse engineer applications and find out what SQL is called from what code: jdbcGrabber, the JDBC wrapper for SQL analysis.

With this tool, you can sniff SQL as it passes through your application and see stack traces of where it was called from. Its pluggable architecture allows you to also inject your own, custom monitoring code with a minimum of fuss.

Try it out and, as usual, please let me know if you have any feedback or improvements.

April 21st, 2010

Java Stack Trace RegEx

by Tim Cull

This is just a quick post because it’s been a while and I wanted to save others from the pain I experienced yesterday.

If you want to parse a Java stack trace with a regular expression and pull out the class name, method name, and line number, then you can use this code below:

Pattern pattern = Pattern.compile("([a-zA-Z0-9_\\.]*)\\.([a-zA-Z0-9_\\.]*)\\([a-zA-Z0-9_\\.]*:([\\d]*)\\)");
Matcher matcher = pattern.matcher(traceString);
while (matcher.find()){
    String className = matcher.group(1);
    String methodName = matcher.group(2);
    int lineNumber = Integer.parseInt(matcher.group(3) == null ? "0" : matcher.group(3));
}

Note that because you are passing a Java string into a regular expression, you have to double-escape many of those characters. For example, if you want to say “any decimal” the usual regular expression is “\d” but because you are using a Java string to define the regular expression you have to double escape it to say “\\d” instead.

I’d like to give some props to David Matuszek whose nifty online Regular Expression Test Applet made debugging this hairy thing much easier.

March 4th, 2010

Climate Change Exposure Disclosure

by Tim Cull

Last month the SEC clarified its guidance around disclosure of risks due to climate change and carbon footprint. Today we see that guidance has had a material effect–95 different climate change petitions from investor groups representing $8 trillion in assets under management.

This is where the big difference will come fighting climate change: requiring the market to reward companies that have made wise, long-term decisions that serve their own best interest as well as the globe’s and to punish companies that don’t. I’m looking forward to the day when we all realize, much as we did with battles over toxic waste releases in the 1970s, that being prudent about carbon footprint isn’t just tree-hugging, feelgood stuff, it’s also good business.

February 24th, 2010

Responsibility Driven Architecture: new IEEE article

by Tim Cull

This month’s issue of IEEE Software will feature an article by me and two colleagues, Stuart Blair and Richard Watt, about Responsibility Driven Architecture, an approach we’ve created to marry the traditional, enterprise-level architectural concerns with a project that wants to do development the Agile way. I encourage you to take a look if you are interested.

January 22nd, 2010

ResultSet Mocking with JMock

by Tim Cull

I found myself recently wanting to mock out a whole mess of database interaction on a legacy system. This system didn’t have a strict data access layer, so direct calls to the database were strewn throughout the business logic.

Because JDBC is such a verbose library, mocking it out can be a challenge. For this task, I found myself with horrific-looking mock methods like this:

       private void mockSpecificPeopleQuery() throws SQLException {
		final PreparedStatement stmt = context.mock(PreparedStatement.class, "specificpeoplePreparedStatement");
		final ResultSet rs = context.mock(ResultSet.class, "specficpeopleResultSet");
		final Sequence rsSequence = context.sequence("specificpeople");
		context.checking(new Expectations() {{
			one(this.dbConnection).prepareStatement("SELECT mbid, people_id, name FROM specific_people"); will(returnValue(stmt));
			one(stmt).executeQuery(); will(returnValue(rs));
			one(rs).next(); inSequence(rsSequence); will(returnValue(true));
			one(rs).getString(1); will(returnValue("mbidCher"));
			one(rs).getInt(2); will(returnValue(2)); //"people_id" column
			one(rs).getString(3); will(returnValue("Cher")); //"name" column
			one(rs).next(); inSequence(rsSequence); will(returnValue(false));
			one(rs).close(); inSequence(rsSequence);
			one(stmt).close(); inSequence(rsSequence);
		}});

	}

I thought there had to be a better way. I remembered and was inspired by a colleague of mine (Denis) who had once nicely encapsulated all this in a helper class. So I wrote myself a simple extension to the JMock Expectations class that makes mocked-out ResultSets a whole lot easier to read, more like this: