A Technologist Who Speaks Business
I’m at QConSF all this week, so you’ll get to hear my impressions of every session I go to. Lucky you!
LinkedIn is a 90% Java shop with lots of memcached for caching and ActiveMQ for messaging. They said they started the traditional way with big relational databases and n-tier architectures, but quickly ran into the scale wall. To give you and idea what they’re talking about, they do 35 million updates per week and 20 million services calls per day.
Once they hit real scale, they found they had to change the way they approached updates in a user’s social network from an “inbox” like approach to more of an “activity area” type approach.
In the old inbox type approach, each time a user does something (say, update status), the system writes a notification to each of that user’s connections describing the update. This means for every one update, the system has to do N reads, where N is the number of user connections. That’s the bad part. The good part is that when that user’s connections go to their home page, only one quick read has to be done to see everything in that connection’s network.
The activity area approach turns this on its head. Instead of every update they write into an “activity area”. Then as that user’s connections log in, their home page does up to N reads to fetch updates from the social network. I say “up to” because they have a very clever filter and summary bit in front that narrows down the list of social connections to a subset the user is likely care about.
Then they went on to describe some of the infrastructure they use. Interestingly, as updates come in they are stored in two places:
level 1 storage: temporal, rolling store on Oracle containing CLOB data with varchar keys
level 2 storage: tenured data on Voldemort containing key-value pairs
And lastly some random tidbits that don’t fit anywhere else: