Monday, October 31, 2011

Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center

B. Hindman, A. Konwinski, M. Zaharia, A. Ghodsi, A. D. Joseph, R. H. Katz, S. Shenker, and I. Stoica. Mesos: A platform for fine-grained resource sharing in the data center. NSDI 2011.
This paper presents Mesos, a platform for sharing resources among various cloud computing frameworks in a fine-grained manner. It introduced a distributed two-level scheduling mechanism in which the central scheduler (or mesos master) distributes the amount of resources among various frameworks, while the individual frameworks themselves decide as to which resources they should accept. While not globally optimal, the authors claim that the system works very well in practice. For a framework to be mesos-compatible, it just requires it to implement a scheduler that registers with the mesos master and an executor process that is launched on slave nodes to run the framework's tasks. It uses resource-filters for efficiency, kills long running tasks (if there is a need) to prevent starvation and uses Linux containers for CPU/Memory isolation. The paper features an impressive evaluation section and the authors show that Mesos can seamlessly scale up to 50,000 virtual nodes on Amazon EC2.


Comments/Critiques:


The paper is very well written and tackles a relevant problem. The authors designed and implemented a real system and took it all the way through (Mesos is now a part of Apache Incubator, deployed in Twitter and 40 node RADLab r-cluster). However, I had a few comments on some aspects of the paper:


  1. Granularity of Scheduling: Mesos presents a per-framework scheduler and assumes each framework to do its own scheduling. However, it was unclear to me if that is the right granularity? How about a per job/per framework scheduler? Though this has other issues with re-writing frameworks etc, this would go a long way to ensure per-framework scalability and even prevent potential conflicts due to 2 level-scheduling.
  2. Delay Scheduling in Hadoop: I think it was interesting that delay scheduling was brought up in this paper, however, I was disappointed that the paper just presented some locality improvements in Hadoop that were in no way related to mesos. I was more interested in the implications due to the 2 levels of scheduler policies. For eg. I was curious to know if delay scheduling might work better if the framework spent some time to pause before accepting resource offers.
  3. Task Resource/Duration prediction: While the paper assumes that each framework can exactly specify its resource requirements. While such an assumption is already made by MR frameworks (by having one slot per task), it is unclear to me if it is always possible on a very generic level.

No comments:

Post a Comment