Improving MapReduce performance in heterogeneous environments

Zaharia, Matei; Konwinski, Andy; Joseph, Anthony D.; Katz, Randy H.; Stoica, Ion

doi:10.5555/1855741.1855744

articleUC BerkeleyDec 8, 2008Closed access

Improving MapReduce performance in heterogeneous environments

MZMatei Zaharia AKAndy Konwinski ADAnthony D. Joseph RHRandy H. Katz ISIon Stoica

University of California, Berkeley

Abstract

MapReduce is emerging as an important programming model for large-scale data-parallel applications such as web indexing, data mining, and scientific simulation. Hadoop is an open-source implementation of MapReduce enjoying wide adoption and is often used for short jobs where low response time is critical. Hadoop's performance is closely tied to its task scheduler, which implicitly assumes that cluster nodes are homogeneous and tasks make progress linearly, and uses these assumptions to decide when to speculatively re-execute tasks that appear to be stragglers. In practice, the homogeneity assumptions do not always hold. An especially compelling setting where this occurs is a virtualized data center, such as…

Citation impact

1,618

total citations

FWCI: 188.95
Percentile: 100%
References: 20

Citations per year

Authors

5

Topics & keywords

Topics

Keywords

Computer science
Cloud computing
Scheduling (production processes)
Search engine indexing
Virtual machine
Distributed computing
Big data
Homogeneous

UN Sustainable Development Goals

Industry, innovation and infrastructure

No related works found for this paper.