1 - Summary This paper introduces Mesos, a platform for sharing commodity clusters across multiple different distributed computing framework. Mesos can be regarded as the kernel of a distributed operating system, it provides fine-grained resource management and runs on every machine of the cluster.
2 - Problem Distributed systems have been widely implemented as a major computing platform. No perfect computing frameworks will be qualified for all applications, and organisations need to run multiple frameworks for different computing requirements in the same cluster. Because it can improve cluster utilization and allow frameworks to share access to large datasets that is too costly to replicate across clusters. The existing solutions for sharing a cluster are not efficient at data sharing and not highly utilizing the cluster, because of the mismatch of the allocation granularities between the solutions and computing frameworks. For example, frameworks have different scheduling needs, and the scheduling system have to scale to clusters of enormous nodes. And very importantly, the system must be fault-tolerant and highly available.
3 - Solution Mesos is designed to sort out the problems above. Mesos introduces a decentralised scheduling model abstraction called resource offer, delegating the control over scheduling to the frameworks by push resources that frameworks can allocate on a cluster node to run tasks. Namely, Mesos decides how many cluster resources to offer based on a particular policy, while frameworks decide which resources to accept and run tasks on them. Mesos allow fine-grained sharing across diverse computing frameworks by providing a common interface for accessing cluster resources.
4 - Novelty Mesos is a new solution to share a cluster across multiple computing framework. Mesos is fine-grained at the level of tasks and introduces a brand-new distributed scheduling mechanism, resource offer, delegating the control over scheduling to the frameworks by push resources that frameworks can allocate on a cluster node to run tasks.
5 - Evaluation Mesos is evaluated through a series of experiments on Amazon EC2. The evaluation starts with a microbenchmark between four workloads. Mesos shows better performance on Facebook Hadoop Mix, Large Hadoop Mix and Spark than statically partitioned cluster, in terms of share of cluster and execution time. However, Torque and MPI performed worse on Mesos. The overhead, decentralized scheduling, scalability and failure recovery are also evaluated.
6 - Opinion Mesos is a reliable platform for sharing commodity clusters across multiple different distributed computing framework. It achieves high utilization, respond quickly to workload changes, and cater to diverse frameworks while remaining scalable and robust. But it seems that Mesos have a high barrier to entry.
- Build multiple frameworks in the same cluster, pick the best one for each application (i.g. Pregel for graph processing, Spark for Machine Learning).
- Share access to large datasets among different frameworks, hence improving the cluster ultilization.