#university

Review note for Pregel: A System for Large-Scale Graph Processing

Pregel Paper
Pregel Paper

1 - Summary Large graphs have been under analysing for years due to their ubiquity and commercial values, while the existing approaches have many limitations in terms of locality, efficiency, flexibility, etc. Google introduced a vertex-centric computational model framework in this paper that is suitable for large-scale graphs processing on clusters of numerous commodity computers in a manner that developers can easily program with an abstract API without concerning distribution-related details behind it. The paper describes Pregel, the large-scale graph processing model, and associated C++ API, discusses its implementation issues, applications to some algorithms, performances results, and also points out the future directions.

Read More

Review note for Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing

RDD paper
RDD paper

1 - Summary The paper proposed an abstraction for sharing data in cluster application, called Resilient Distributed Dataset (RDD), that is more efficient, general-purpose and fault-tolerant in comparison to existing data storage abstractions for clusters, allowing programmers to process in-memory computations. RDDs are implemented in a big data processing engine, called Spark, and evaluated by a range of user applications and benchmarks in the paper.

Read More

Review note for MapReduce: simplified data processing on large clusters

MapReduce Paper
MapReduce Paper

1 - Summary A programming framework, MapReduce, is introduced to easily process large-scale computations on large clusters of commodity PCs. In a MapReduce model, users utilize a self-defined map function on mapping workers to process splits of input data, generating set of intermediate key/value pairs and use a reduce function on reducing workers to sort and merge the intermediate values with the same key. The paper also provides some programming model examples and typical implementations of MapReduce that process terabytes of data on thousands of machines. Some refinements of the model, the performance evaluation of the implementation, the use of MapReduce on indexing system within Google, and related and future work are also discussed in the paper.

Read More

Your browser is out-of-date!

Update your browser to view this website correctly. Update my browser now

×