#Hadoop

Review note for MapReduce: simplified data processing on large clusters

MapReduce Paper
MapReduce Paper

1 - Summary A programming framework, MapReduce, is introduced to easily process large-scale computations on large clusters of commodity PCs. In a MapReduce model, users utilize a self-defined map function on mapping workers to process splits of input data, generating set of intermediate key/value pairs and use a reduce function on reducing workers to sort and merge the intermediate values with the same key. The paper also provides some programming model examples and typical implementations of MapReduce that process terabytes of data on thousands of machines. Some refinements of the model, the performance evaluation of the implementation, the use of MapReduce on indexing system within Google, and related and future work are also discussed in the paper.

Read More

Your browser is out-of-date!

Update your browser to view this website correctly. Update my browser now

×