Monday, September 26, 2011

Megastore: Providing Scalable, Highly Available Storage for Interactive Services

J. Baker, C. Bond, J.C. Corbett, JJ Furman, A. Khorlin, J. Larson, J-M Léon, Y. Li, A. Lloyd, V. Yushprakh. Megastore: Providing Scalable, Highly Available Storage for Interactive Services. CIDR 2011 
This paper talks about Megastore, a storage framework developed and used at Google that guarentees NoSQL based scalability, high availability and fairly strong ACID based semantics. In my view, the prime focus of the paper was essentially to strike a balance between 3 inherent tradeoffs: Scalability, Wide Area Replication and ACID based semantics. While Brewer's CAP theorem only guarantees us any 2 out of Consistency, Availability and Partition Tolerance, Megastore successfully highlights that we can build real world systems that could strike a good balance between these 3 properties. The following points very briefly highlight the key contributions of this paper in each of the aforementioned areas:

  1. Scalability: Partitions data into entity groups (which can be thought of as a small database) and stores them in Bigtable.
  2. ACID Transactions: Maintains a write-ahead log per entity group to track all writes. Further, it performs two-phase commits or implements asynchronous queues to pass messages between entity groups.
  3. Wide-Area Replication: It used Paxos for replicating data (which is kind of cool). Further, it tweaked it for optimal latency. (On average, it incurs an overhead of 1 WAN RTT for writes and 0 WAN RTTs i.e. local reads).
 It also presents a schema language as well which allows applications to have full control on the query plan (indices, joins etc.) as well as placement of data.

Comments/Critiques:

As I highlighted in the class, the success of MegaStore depends on its wide applicability to a variety of workloads that could be cleanly divided between entity groups without incurring too much realtime intra-group communication overhead. While this does work with emails, blogs, maps etc., it is unclear to me if it data partitioning into entity groups would still be as trivial in (say) complex social graphs. Recent developments like Facebook Ticker or publishing realtime twitter feeds doesn't fit too well in the Megastore Entity-Group model.

Secondly, it is unclear to me if Megastore made the right tradeoffs by choosing consistency over performance. While the paper does try to paint a very rosy picture in terms of query latencies, a write latency of 100-400ms and read latency of 10ms is not negligible and hiding it requires considerable effort on part of developers.

Last but not the least, while building Megastore on their exisiting BigTable based infrastructure does make sense for Google in terms of better management, scalability and performance, it is a bit unclear to me if that is indeed a right way to go? What if Google had used MySQL servers for each set of entity groups? A general question might be: Are NoSQL DataStores the right building blocks for data parallel computing? Or would a collection of RDBMS's do the same job well?   

No comments:

Post a Comment