1 - Summary This paper introduces Mesos, a platform for sharing commodity clusters across multiple different distributed computing framework. Mesos can be regarded as the kernel of a distributed operating system, it provides fine-grained resource management and runs on every machine of the cluster.
1 - Summary The paper proposed an abstraction for sharing data in cluster application, called Resilient Distributed Dataset (RDD), that is more efficient, general-purpose and fault-tolerant in comparison to existing data storage abstractions for clusters, allowing programmers to process in-memory computations. RDDs are implemented in a big data processing engine, called Spark, and evaluated by a range of user applications and benchmarks in the paper.
1 - Summary A programming framework, MapReduce, is introduced to easily process large-scale computations on large clusters of commodity PCs. In a MapReduce model, users utilize a self-defined map function on mapping workers to process splits of input data, generating set of intermediate key/value pairs and use a reduce function on reducing workers to sort and merge the intermediate values with the same key. The paper also provides some programming model examples and typical implementations of MapReduce that process terabytes of data on thousands of machines. Some refinements of the model, the performance evaluation of the implementation, the use of MapReduce on indexing system within Google, and related and future work are also discussed in the paper.