Archive for March, 2006

March 25th, 2006

Pass-Through Attributes

by Tim Cull

In an enterprise environment, systems don’t just exist by themselves. They usually sit somewhere in the middle of a longish chain of systems that implement a long and complicated business process.

Take the systems at the financial company where I work as an example. Our researchers have a set of systems to find exploitable trends. The research systems send signals to our portfolio management systems that actually keep track of what we own. The portfolio management systems create orders and send them to our trade order management systems. The trading systems send orders to brokers and get fills back. Then the trading systems let the rest of our back office (a dozen or so other systems) know what happened.

Each system along the way only really cares about a subset of the total universe of information there is to know from signal to trade. But here’s the kicker: often a system will care about information from upstream that its immediate neighbors don’t care about.

Say you have a business process flow like this for an Order:
Research->Trading->Accounting

Research creates some piece of information (call it FooFactor) that Accounting needs to know. But Trading could care less about FooFactor. How does Research tell Accounting what the FooFactor is?

There are several common approaches:

Hot Potato: Information is passed in a chain one system after the other. This is the easiest to understand and probably the most common. In this case, Research creates an Order that contains a FooFactor attribute. Trading preserves the FooFactor as it was given to it and passes the whole Order on to Accounting. Sounds simple, right? But we’ve glossed over several details that make this whole thing much more difficult, such as:
1. The Research, Trading, and Accounting systems are maintained by different teams with different release cycles, managers, and priority setting steering committees.
2. Research, Trading and Accounting don’t really operate on the same Order object. Most likely, there’s a ResearchOrder, TradingOrder, and AccountingOrder. Research indirectly creates the TradingOrder by calling some public API on the Trading system and Trading indirectly creates the AccountingOrder by calling an API on the Accouting system.

Why does #1 matter? Well if Research and Accounting suddenly want to start tracking a new piece of information (call it BarFactor) then they need to get themselves on the schedule for Trading. And the steering committee for Trading doesn’t get any value from diverting their programmers to add BarFactor to the TradingOrder.

Why does #2 matter? Think about what happens when you’re not forced to use the same datatype. Maybe Research uses a Double for FooFactor. Trading, because it doesn’t care about FooFactor, just stores it as a String. Accounting may use a Float. This all works fine until the first time someone places an order with 40 billion and 1/1000000 FooFactor. Then Research works fine but Accounting blows up. Or even worse, Trading silently truncates FooFactor because its database representation of FooFactor only had 10 characters. So Accounting gets the wrong number but doesn’t know it.

Big Data Repository: Nobody knows anything, everything is done in a hub and spoke fashion, like FedEx (or should that be FedEx-Kinkos?). Each system checks an Order out of some central data repository, operates on it, and then puts it back again. Now Research can add BarFactor to an Order for Accounting to read and Trading doesn’t have to lift a finger.

Sounds great, right? But think about what you’ve just done. Now your entire business depends on one data repository. If it goes down, you are hosed. Not just a little hosed, or hosed only in one place, but so completely hosed that literally everyone in your company is looking for the IT person to fire. (this has happened to us. it’s not pretty).

On top of your shiny new single-point-of-failure you now have a coordination headache. Say the original design of FooFactor was bone-headed and you need to change it. Say from a String to a Double. How do you figure out which systems might be using FooFactor? How do you make sure they all release their patches at the same instant? How do you migrate historical data? How do you get the change high enough on all their steering committee queues?

Lastly, if everything in your company’s business process must pass through a central repository, guess where your biggest performance bottleneck is going to be?

Don’t Shoot the Messenger: Everything is published to everybody on a message bus. In this case Accounting pulls what it wants off the Order that Research publishes to the message bus and Trading pulls what it wants off the Order that Research publishes to the message bus.

That sounds pretty wonderful and in reality it certainly has its advantages. You still have a single point of failure in the messaging infrastructure, but somehow it seems less of a risk because no data is actually stored there, so in a pinch you can work around it manually. Research and Accounting can add what they want to the message without involving Trading (assuming the message format is something flexible like XML). You have to worry less about data types because everything is a string (again, assuming XML) more or less (schema constraints aside).

But what happens when you need to modify an existing attribute in the Order message? Again you have the same organizational and prioritization constraints. Testing your changes becomes harder just because there are more moving pieces. And since each system probably persists a working copy of the Order in a local data store, you have to spend an inordinate amount of time reconciling and cross checking data to make sure a message wasn’t dropped or mis-processed somewhere.

May I Be of Service?: The latest and greatest thing is Service Oriented Architectures (SOA). This is the only one of these I haven’t had much opportunity to use “in anger” but it has its own limitations, too.

SOA architectures in their ideal sense are a set of isolated services that implement a specific business need and some sort of orchestrator that connects them all together. Over time, though, this orchestrator starts to look like a full-blown application in its own right. In this particular case (pass-through attributes), you end up getting all the drawbacks of all the possibilities above with few of their benefits.

And the Winner Is?: I favor the messaging approach. It’s as decoupled as you can reasonably get, and at runtime you don’t have any dependencies on any other systems. Just be sure you have a good sized support team to chase “missing” data.

March 10th, 2006

Book Review-let: The Tipping Point

by Tim Cull

I call this a review-let because it will be very, very short and full of sweeping generalizations. I’m still leaning on my 8-day-old daughter as an excuse.

I read all 260 pages of The Tipping Point in a day and a half (before my daughter was born), which will seem spectacular only if you know that in the last 6 months, I’ve only read about 250 pages of anything in total.

The book was written right in the middle of the dot-com era and it was interesting to see how either it was heavily effected by the buzz then, or it is what created the buzz then. Either way, now 5 years down the line we can see some of its ideas played out in various guerrilla marketing tactics like spray painting the IBM-ified Linux penguin on side walks, or super-targeted mailings and promotional campaigns, or political campaigns like Howard Dean’s.

Anyhow, I obviously found the book interesting enough to read it in one day, so I recommend it.

March 8th, 2006

Why Such Sparse Posts?

by Tim Cull

So my posting has slowed a bit lately which requires some explanation: I just had a new baby girl!

She and mom are doing great and her older brother is starting to acknowledge her existence, so things are great. But for the next few weeks at least, most of what you’re going to see here is links and that’s about it. But I’ll be back with some real posts soon, not to worry.

March 7th, 2006

A Post Mostly for Myself

by Tim Cull

I want to do my next personal project in Ruby and here is what people say is a good introduction to Ruby (I haven’t read it yet):

http://poignantguide.net/ruby/

I haven’t read it yet, I’m mostly putting the link here as a reminder to myself.