Spark: cluster computing with working sets

Zaharia, Matei; Chowdhury, Mosharaf; Franklin, Michael J.; Shenker, Scott; Stoica, Ion

articleJun 22, 2010Closed access

Spark: cluster computing with working sets

MZMatei Zaharia MCMosharaf Chowdhury MJMichael J. Franklin SSScott Shenker ISIon Stoica

Abstract

MapReduce and its variants have been highly successful in implementing large-scale data-intensive applications on commodity clusters. However, most of these systems are built around an acyclic data flow model that is not suitable for other popular applications. This paper focuses on one such class of applications: those that reuse a working set of data across multiple parallel operations. This includes many iterative machine learning algorithms, as well as interactive data analysis tools. We propose a new framework called Spark that supports these applications while retaining the scalability and fault tolerance of MapReduce. To achieve these goals, Spark introduces an abstraction called resilient distributed…

Citation impact

4,236

total citations

FWCI: 134.27
Percentile: 100%
References: 20

Citations per year

Authors

5

Topics & keywords

Topics

Keywords

Computer science
SPARK (programming language)
Scalability
Partition (number theory)
Big data
Abstraction
Distributed computing
Set (abstract data type)

UN Sustainable Development Goals

Industry, innovation and infrastructure

No related works found for this paper.