#Spark

Review note for Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing

RDD paper
RDD paper

1 - Summary The paper proposed an abstraction for sharing data in cluster application, called Resilient Distributed Dataset (RDD), that is more efficient, general-purpose and fault-tolerant in comparison to existing data storage abstractions for clusters, allowing programmers to process in-memory computations. RDDs are implemented in a big data processing engine, called Spark, and evaluated by a range of user applications and benchmarks in the paper.

Read More

Your browser is out-of-date!

Update your browser to view this website correctly. Update my browser now

×