articleJun 29, 2009Closed access

A comparison of approaches to large-scale data analysis

Brown University · University of Wisconsin–Madison · +3 more institutions

Indexed incrossref

Abstract

There is currently considerable enthusiasm around the MapReduce (MR) paradigm for large-scale data analysis [17]. Although the basic control flow of this framework has existed in parallel SQL database management systems (DBMS) for over 20 years, some have called MR a dramatically new computing model [8, 17]. In this paper, we describe and compare both paradigms. Furthermore, we evaluate both kinds of systems in terms of performance and development complexity. To this end, we define a benchmark consisting of a collection of tasks that we have run on an open source version of MR as well as on two parallel DBMSs. For each task, we measure each system's performance for various degrees of parallelism on a cluster…

Citation impact

1,075
total citations
FWCI
239.88
Percentile
100%
References
19
Citations per year

Authors

7

Topics & keywords

Keywords
  • Computer science
  • Benchmark (surveying)
  • Task (project management)
  • Parallelism (grammar)
  • Process (computing)
  • Xeon Phi
  • Control flow
  • Scale (ratio)
No related works found for this paper.