articleApr 25, 2012Closed access
Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing
University of California, Berkeley
Abstract
We present Resilient Distributed Datasets (RDDs), a distributed memory abstraction that lets programmers perform in-memory computations on large clusters in a fault-tolerant manner. RDDs are motivated by two types of applications that current computing frameworks handle inefficiently: iterative algorithms and interactive data mining tools. In both cases, keeping data in memory can improve performance by an order of magnitude. To achieve fault tolerance efficiently, RDDs provide a restricted form of shared memory, based on coarsegrained transformations rather than fine-grained updates to shared state. However, we show that RDDs are expressive enough to capture a wide class of computations, including recent…
Citation impact
3,577
total citations
- FWCI
- 521.87
- Percentile
- 100%
- References
- 39
Citations per year
Authors
9Topics & keywords
Topics
Keywords
- Computer science
- Fault tolerance
- Abstraction
- Distributed computing
- Class (philosophy)
- Programming paradigm
- SPARK (programming language)
- Variety (cybernetics)
No related works found for this paper.