articleApr 25, 2012Closed access

Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing

University of California, Berkeley

Abstract

We present Resilient Distributed Datasets (RDDs), a distributed memory abstraction that lets programmers perform in-memory computations on large clusters in a fault-tolerant manner. RDDs are motivated by two types of applications that current computing frameworks handle inefficiently: iterative algorithms and interactive data mining tools. In both cases, keeping data in memory can improve performance by an order of magnitude. To achieve fault tolerance efficiently, RDDs provide a restricted form of shared memory, based on coarsegrained transformations rather than fine-grained updates to shared state. However, we show that RDDs are expressive enough to capture a wide class of computations, including recent…

Citation impact

3,577
total citations
FWCI
521.87
Percentile
100%
References
39
Citations per year

Authors

9

Topics & keywords

Keywords
  • Computer science
  • Fault tolerance
  • Abstraction
  • Distributed computing
  • Class (philosophy)
  • Programming paradigm
  • SPARK (programming language)
  • Variety (cybernetics)
No related works found for this paper.