Thedwick

A Technologist Who Speaks Business

Thedwick

1282 days ago
LinkedIn: Network Updates Uncovered with Ruslan Belkin and Sean Dawson — QConSF 2009 Impressions

I’m at QConSF all this week, so you’ll get to hear my impressions of every session I go to. Lucky you!

LinkedIn is a 90% Java shop with lots of memcached for caching and ActiveMQ for messaging. They said they started the traditional way with big relational databases and n-tier architectures, but quickly ran into the scale wall. To give you and idea what they’re talking about, they do 35 million updates per week and 20 million services calls per day.

Once they hit real scale, they found they had to change the way they approached updates in a user’s social network from an “inbox” like approach to more of an “activity area” type approach.

In the old inbox type approach, each time a user does something (say, update status), the system writes a notification to each of that user’s connections describing the update. This means for every one update, the system has to do N reads, where N is the number of user connections. That’s the bad part. The good part is that when that user’s connections go to their home page, only one quick read has to be done to see everything in that connection’s network.

The activity area approach turns this on its head. Instead of every update they write into an “activity area”. Then as that user’s connections log in, their home page does up to N reads to fetch updates from the social network. I say “up to” because they have a very clever filter and summary bit in front that narrows down the list of social connections to a subset the user is likely care about.

Then they went on to describe some of the infrastructure they use. Interestingly, as updates come in they are stored in two places:
level 1 storage: temporal, rolling store on Oracle containing CLOB data with varchar keys
level 2 storage: tenured data on Voldemort containing key-value pairs

And lastly some random tidbits that don’t fit anywhere else:

  • They use Zenoss for monitoring (as do lots of presenters here)
  • Even at this scale they still use xml and are happy about it
  • Information given to service is sometimes unresolved (ie. member id instead of first/last name) and
    gets resolved by a service in batch
  • They’ve optimized comment streams by duplicating the first and last comments in their update summary and the full comment thread in Tier 2 storage
♦ End

Comments are closed.

UA-16297310-1