thedwick

March 4, 2010

Climate Change Exposure Disclosure

Last month the SEC clarified its guidance around disclosure of risks due to climate change and carbon footprint. Today we see that guidance has had a material effect–95 different climate change petitions from investor groups representing $8 trillion in assets under management.

This is where the big difference will come fighting climate change: requiring the market to reward companies that have made wise, long-term decisions that serve their own best interest as well as the globe’s and to punish companies that don’t. I’m looking forward to the day when we all realize, much as we did with battles over toxic waste releases in the 1970s, that being prudent about carbon footprint isn’t just tree-hugging, feelgood stuff, it’s also good business.

Bookmark and Share
Filed under: Technology — Tim Cull @ 9:16 am

February 24, 2010

Responsibility Driven Architecture: new IEEE article

This month’s issue of IEEE Software will feature an article by myself and two colleagues, Stuart Blair and Richard Watt about Responsibility Driven Architecture which is an approach we’ve created to marry the traditional, enterprise-level architectural concerns with a project that wants to do development the Agile way. I encourage you to take a look if you are interested!

Bookmark and Share
Filed under: Technology — Tim Cull @ 4:11 pm

January 22, 2010

ResultSet Mocking with JMock

I found myself recently wanting to mock out a whole mess of database interaction on a legacy system. This system didn’t have a strict data access layer, so direct calls to the database were strewn throughout the business logic.

Because JDBC is such a verbose library, mocking it out can be a challenge. For this task, I found myself with horrific-looking mock methods like this:

       private void mockSpecificPeopleQuery() throws SQLException {
		final PreparedStatement stmt = context.mock(PreparedStatement.class, "specificpeoplePreparedStatement");
		final ResultSet rs = context.mock(ResultSet.class, "specficpeopleResultSet");
		final Sequence rsSequence = context.sequence("specificpeople");
		context.checking(new Expectations() {{
			one(this.dbConnection).prepareStatement("SELECT mbid, people_id, name FROM specific_people"); will(returnValue(stmt));
			one(stmt).executeQuery(); will(returnValue(rs));
			one(rs).next(); inSequence(rsSequence); will(returnValue(true));
			one(rs).getString(1); will(returnValue("mbidCher"));
			one(rs).getInt(2); will(returnValue(2)); //"people_id" column
			one(rs).getString(3); will(returnValue("Cher")); //"name" column
			one(rs).next(); inSequence(rsSequence); will(returnValue(false));
			one(rs).close(); inSequence(rsSequence);
			one(stmt).close(); inSequence(rsSequence);
		}});

	}

I thought there had to be a better way. I remembered and was inspired by a colleague of mine (Denis) who had once nicely encapsulated all this in a helper class. So I wrote myself a simple extension to the JMock Expectations class that makes mocked-out ResultSets a whole lot easier to read, more like this:

	private void mockSpecificPeopleQuery() throws SQLException {
		ResultSetExpectations mmsSpecificPeopleQuery = new ResultSetExpectations(this.dbConnection, this.context, "specificPeople");
		mmsSpecificPeopleQuery.expectQuery("SELECT mbid, people_id, name FROM specific_people");
		mmsSpecificPeopleQuery.newRow();
		mmsSpecificPeopleQuery.willReturnString(1,MBID_CHER);//"mbid" column
		mmsSpecificPeopleQuery.willReturnInt(2, CHER_ID);//"people_id" column
		mmsSpecificPeopleQuery.willReturnString(3, CHER);//"name" column
		mmsSpecificPeopleQuery.finishRows();
		this.context.checking(mmsSpecificArtistsQuery);
	}

Here’s the class. Feel free to pilfer it:

public class ResultSetExpectations extends Expectations {
	private final Sequence rowsSequence;
	private final Connection connectionMock;
	private final PreparedStatement stmt;
	private final ResultSet rs;
	private final Mockery context;

	public ResultSetExpectations(Connection conn, Mockery context, String disambiguation){
		this.context = context;
		this.connectionMock = conn;
		this.rowsSequence = context.sequence(disambiguation+"Sequence");
		this.stmt = context.mock(PreparedStatement.class, disambiguation+"PreparedStatement");
		this.rs = context.mock(ResultSet.class, disambiguation+"ResultSet");
	}

	public void expectQuery(String sql) throws SQLException{
		one(this.connectionMock).prepareStatement(sql);
		will(returnValue(stmt));
		one(stmt).executeQuery();
		will(returnValue(rs));

	}

	public void newRow() throws SQLException{
		super.one(rs).next();
		super.inSequence(this.rowsSequence);
		super.will(returnValue(true));
	}

	public void willReturnInt(int column, int returnValue) throws SQLException{
		super.one(rs).getInt(column);
		super.will(returnValue(returnValue));
	}

	public void willReturnString(int column, String returnValue) throws SQLException {
		super.one(rs).getString(column);
		super.will(returnValue(returnValue));
	}

	public void finishRows() throws SQLException{
		super.one(rs).next(); super.inSequence(this.rowsSequence); super.will(returnValue(false));
		super.one(stmt).close();
		super.one(rs).close();
	}

}

}
Bookmark and Share
Filed under: HowTo, Technology — Tim Cull @ 8:35 pm

December 17, 2009

Code Smell: Application “Guarding”

There’s a medical term called “guarding” where a patient reflexively (possibly unconsciously) tenses or flinches when touched to avoid irritating an internal organ that hurts or is injured. Guarding is such a reliable reaction that it’s actually used as a diagnostic sign to indicate that something might be wrong with you.

The same thing happens with software.

Have you ever watched a team go to ridiculous lengths not to touch a certain piece of their application? That application guarding is a code smell. It means something inside is sick and needs to be treated, not ignored.

Here are two examples from projects I’ve worked on:

1) We had a company-wide XML standard for exchanging domain-specific information between departments and applications. As you might expect with such a thing, we were careful to make sure the standard was well thought out and that it changed in a controlled way so that application teams weren’t surprised by sudden breaks. But when we had the governance process around changing the schemas wrapped up too tightly, individual application teams started stuffing inappropriate values into inappropriate fields and agreeing to implied “schemas” amongst themselves. They would go to great lengths not to have to approach the central team that managed the schemas. This “guarding” around schemas was a symptom that the governance process (and in some cases, the schemas themselves) needed help.

2) I once worked on an application that had a Visual Basic 6 front end and a Java 1.3 back end. Back then (1999) people were still seriously trying to use CORBA, so we tried to use CORBA to talk between the front end and the back end. The problem was that Visual Basic had no way of talking CORBA, so we implemented a thin, client-side layer that talked CORBA and wrapped it in COM, using Microsoft Visual J++ (remember that?) which had proprietary extensions for making a Java object behave like a COM object. The whole thing was incredibly convoluted and the technologies involved had nothing to do with the technologies involved in the rest of the project. So, over time, when the signatures of the service methods really should have changed, application developers stuffed implicit switches and flags and values into existing signatures instead because they were afraid of the J++/COM/CORBA layer and didn’t know how to change it. Eventually, that application team fixed the underlying “sick organ” and replaced the whole layer with XML-RPC.

So, in summary, if there’s a part of the application your teams goes great lengths to avoid, maybe you have a sick organ. Time for a transplant!

Bookmark and Share
Filed under: Technology — Tim Cull @ 8:20 pm

November 22, 2009

Common Themes — QConSF 2009 Impressions

I’m at QConSF all this week, so you’ll get to hear my impressions of every session I go to. Lucky you!

I’ve had a good week at QConSF. I’ll try to summarize some of the themes I saw in the tracks I attended.

You don’t know scale like these guys know scale

Many of the presenters were talking about applications at truly mind-blowing scale. Historically, that kind of scale would only apply to secret government operations and serious physics research. But today we’re talking about that scale for silly consumer sites where people post pictures about their cats. The best takeaway from that is that everyone should design with serious scale in mind and no longer assume it doesn’t appply to them. If it doesn’t apply to you today, it may very well tomorrow.

Cheap and horizontal, not expensive and vertical

Every one of these guys that operate at scale do so on commodity or almost commodity hardware. I didn’t hear anyone mention 64-way Sun servers, except as a joke.

Asynchronous interaction and coupling

Applications have to be designed for asynchronous interaction. That means not only between tiers, but also with the user. To get the kinds of performance and resiliency gains many of these sessions were talking about, you can’t do it any other way. Also, asynchronicity helps take advantage of future innovations like cloud computing.

Hadoop and map reduce

Map reduce in general and Hadoop in particular were everywhere. Even the Microsoft guy made a plug for Hadoop. If you’re a technologist and looking for a good framework to spend some time learning, you can’t go wrong with Hadoop.

Measure, monitor, and profile

I remember many years ago attending an SDWest conference where every presenter said something like “and of course here you’d run your unit tests” as if using unit tests were a foregone conclusion. That’s what measuring, monitoring, and profiling were like at this conference. If you’re not doing it routinely on your applications, then you are seriously missing the boat. (that goes double for unit testing, you slackers!)

Expect failure

Enterprise developers (which is most of my background) spend a lot of time and effort designing systems so that they “never” go down. If we lose a piece of hardware, or a disk drive, or a network link, then that’s a big deal and often we’ll spend many hours in meetings explaining the root cause and what steps we’re taking to make sure it never happens again. Enterprise developers could learn a lot from the huge-scale web developers who dominated this conference: just give up! Design systems to expect failure and just retry when it happens. As systems reach a certain level of complexity, it becomes literally impossible to guarantee that all their individual components are in place and healthy at the same time, all the time. Instead we have to design systems to assume failure and survive anyway. This is the true meaning of “fault tolerant” — an often abused phrase sometimes falsely interpreted to mean “never breaks.”

Eventual consistency

To listen to the presenters at this conference, you’d think ACID is dead and buried. At massive web scale, you instead have to design for ACID’s naughty younger brother: Eventual Consistency. This means that different actors using different parts of a system might get different views of the same piece of data. Eventual Consistency’s answer to that problem?: “Get over it, they’ll be the same soon enough.” As with the Expect Failure concept, I suspect this might be a hard (but necessary) pill for enterprise developers to swallow.

Bookmark and Share
Filed under: Conferences, Technology — Tim Cull @ 10:17 pm

November 20, 2009

LinkedIn: Network Updates Uncovered with Ruslan Belkin and Sean Dawson — QConSF 2009 Impressions

I’m at QConSF all this week, so you’ll get to hear my impressions of every session I go to. Lucky you!

LinkedIn is a 90% Java shop with lots of memcached for caching and ActiveMQ for messaging. They said they started the traditional way with big relational databases and n-tier architectures, but quickly ran into the scale wall. To give you and idea what they’re talking about, they do 35 million updates per week and 20 million services calls per day.

Once they hit real scale, they found they had to change the way they approached updates in a user’s social network from an “inbox” like approach to more of an “activity area” type approach.

In the old inbox type approach, each time a user does something (say, update status), the system writes a notification to each of that user’s connections describing the update. This means for every one update, the system has to do N reads, where N is the number of user connections. That’s the bad part. The good part is that when that user’s connections go to their home page, only one quick read has to be done to see everything in that connection’s network.

The activity area approach turns this on its head. Instead of every update they write into an “activity area”. Then as that user’s connections log in, their home page does up to N reads to fetch updates from the social network. I say “up to” because they have a very clever filter and summary bit in front that narrows down the list of social connections to a subset the user is likely care about.

Then they went on to describe some of the infrastructure they use. Interestingly, as updates come in they are stored in two places:
level 1 storage: temporal, rolling store on Oracle containing CLOB data with varchar keys
level 2 storage: tenured data on Voldemort containing key-value pairs

And lastly some random tidbits that don’t fit anywhere else:

  • They use Zenoss for monitoring (as do lots of presenters here)
  • Even at this scale they still use xml and are happy about it
  • Information given to service is sometimes unresolved (ie. member id instead of first/last name) and
    gets resolved by a service in batch
  • They’ve optimized comment streams by duplicating the first and last comments in their update summary and the full comment thread in Tier 2 storage
Bookmark and Share
Filed under: Conferences, Technology — Tim Cull @ 1:36 pm

Facebook’s Petabyte Scale Data Warehouse using Hive and Hadoop with Namit Jain and Ashish Thusoo — QConSF 2009 Impressions

I’m at QConSF all this week, so you’ll get to hear my impressions of every session I go to. Lucky you!

Facebook handles 200GB/day worth of updates coming in and 12+TB per day if you include derived data. That’s a lot of data and has no hope of fitting in a traditional data warehouse like Oracle. Consequently, they use Hadoop for both data storage and data processing, as do many organizations that work at that kind of scale. But once they started doing that, they ran into the problem that it’s very difficult, especially for analysts, to conduct ad-hoc queries over the data.

So, the Facebook team created HIVE as a SQL-like layer over Hadoop to allow for ad-hoc analysis. HIVE is an open-source sub project of Hadoop. They spent most of the talk describing HIVE and some of the clever ways they use Hadoop and map-reduce to execute SQL-like queries in parallel.

To give you an idea of the kind of load they’re putting through the system, they said they have a production Hadoop cluster with 5800 cores, 8.7 PB (/3 for replication) of data. Over this cluster they run ~7500 HIVE jobs per day. Wow. That’s not just massive scale, that’s mind-blowing scale.

Bookmark and Share
Filed under: Conferences, Technology — Tim Cull @ 11:26 am

November 19, 2009

CA Clean Tech Open Winner: EcoFactor

CRW_3503_2I had the privilege of helping out with the CA CleanTech Open on Tuesday.  This year, the $250,000 winner was EcoFactor–a company that makes smart thermostats that dynamically adjust your thermostat based on the weather and other factors.  According to the company, an average household can shave as much as 20% off their home heating load without feeling any difference in comfort.  Earth2Tech covered the Awards Gala in more detail than I could have (hey, I was busy being one of the event photographers, ok!).

The energy in this organization is impressive and the momentum the whole green tech sector has gained in the last three years is overwhelming.  Each of the three years I’ve been going to the awards gala, the production has doubled in size. That’s even more impressive when you know that the whole thing is put on by volunteers.

Each day when I read a new article in the press about climate change and the drastic adjustments we’ll all have to make just to stay alive, I find myself getting down. But being in that room Tuesday was a real boost. The whole place was packed with the nation’s best and brightest and everyone was swimming in the right direction. The pitches from this year’s companies were the best yet and show a maturing and a focus that might just pull us through this thing

Bookmark and Share
Filed under: Green, Technology — Tim Cull @ 9:29 am

November 18, 2009

Caching at Scale, Architecture Reviews, and Hadoop — QConSF 2009 Impressions

I’m at QConSF all this week, so you’ll get to hear my impressions of every session I go to. Lucky you!

Today is the start of the shorter sessions, so you get a three-for-one deal.

1) Caching at Scale with Alex Miller
Miller works for Terracotta and so most of what he concentrated on was EHCache and Terracotta. Much of the session had to do with configuring and using each of those tools, but I did get a couple of good reminders about what’s good to cache and what isn’t. Specifically, before caching something, make sure it has good “locality” (i.e. the same piece of data tends to be asked for in clumpy bursts of time) and a good distribution (i.e. the majority of people ask of a small subset of the total data universe).

2) Lessons Learned from Architecture Reviews with Rebecca Wirfs-Brock
Wirfs-Brock opened with two slides showing two different ideas of “collaborative.” In one, all the stakeholders and reviewers of an architecture gather together in harmony and all are shooting for the same goal for the common good. In the other, they only collaborate in the sense that the conquered collaborate with their occupying army. It’s important to know which kind of situation you’re in before picking your toolset to deal with it. I was a little shaken when she showed a slide of my boss’ book and said it was an example of a toolset to use in the occupying army kind of collaboration. What does that say about my day job?!

My best takeaway from the talk is that it’s useful to clearly organize your architectural feedback into buckets:
1) Recommendations — we really think you need to do these and not doing them would be a mistake
2) Suggestions — if you do these I predict they will make you happy, but you won’t miss them if you don’t do them
3) Observations — a place to put statements about perceived problems that aren’t really problems, or point out good choices that should be kept

3) Hadoop with Philip Ziegler
Hadoop is a system for running massive calculations over massive amounts of data. Ziegler took us through an overview of it that was good and engaging, but really not much different from what you can get reading the web site.

Bookmark and Share
Filed under: Conferences, Technology — Tim Cull @ 4:53 pm

Domain Specific Languages with Ola Bini and Martin Fowler — QConSF 2009 Impressions

I’m at QConSF all this week, so you’ll get to hear my impressions of every session I go to. Lucky you!

In college, I never took the “compilers” class that most of my other classmates took. For the first five years of my career I felt smug and superior for not having wasted the time, but I’ve spent every year since then regretting the decision.

Domain Specific Languages (or DSLs) are not a new concept (how long has “make” been around?). But they have been catching a lot of renewed attention lately. We are seeing a convergence of enabling technologies that make creating them easier: dynamic languages like ruby and python and easy-to-use parsers like antlr. This ease is important because, by definition, a DSL has to be specific to a domain and therefore you can’t spread the cost of its creation over very many projects.

Nowadays I thoroughly regret not having taken the compilers class, but Bini and Fowler’s day-long session yesterday helped fill some of the gap. The session was vary technical and (frankly) very dry. They talked a great deal about parsers, parse trees and symantic models. BUt the density was warranted because you can’t really understand the full power of a DSL until you know those concepts. Fowler and Bini gave us just enough background not to hang ourselves with our shiny new ropes.

One point they hammered again and again was that you need to keep your syntax, your semantic model, and your execution seperate. For example, if you were to write a new kind of Spring configuration file format in JSON, then JSON would be your syntax, the Spring BeanDefinition interface would be your semantic model, and the Spring GenericApplicationContext might be your executing code. Many implementations of DSLs might be tempted to leap directly from parsing the input to calling code on the fly but according to the presenters that ususally leads to heartache as your DSL becomes more complex.

They also went into detail about the difference between an External DSL (something you have to write a parser for, like Ant) and an Internal DSL (basically helper functions on top of an existing language, like Rake).

Bookmark and Share
Filed under: Conferences, Technology — Tim Cull @ 4:33 pm
Next Page »

Powered by WordPress