MapReduce online

Condie, Tyson; Conway, Neil; Alvaro, Peter; Hellerstein, Joseph M.; Elmeleegy, Khaled; Sears, Russell

doi:10.5555/1855711.1855732

articleApr 28, 2010Closed access

MapReduce online

TCTyson Condie NCNeil Conway PAPeter Alvaro JMJoseph M. Hellerstein KEKhaled Elmeleegy

Berkeley College · University of California, Berkeley

Abstract

MapReduce is a popular framework for data-intensive distributed computing of batch jobs. To simplify fault tolerance, many implementations of MapReduce materialize the entire output of each map and reduce task before it can be consumed. In this paper, we propose a modified MapReduce architecture that allows data to be pipelined between operators. This extends the MapReduce programming model beyond batch processing, and can reduce completion times and improve system utilization for batch jobs as well. We present a modified version of the Hadoop MapReduce framework that supports online aggregation, which allows users to see early returns from a job as it is being computed. Our Hadoop Online Prototype (HOP) also…

Citation impact

683

total citations

FWCI: 204.27
Percentile: 100%
References: 30

Citations per year

Authors

6

Topics & keywords

Topics

Keywords

Computer science
Implementation
Fault tolerance
Big data
Task (project management)
Distributed computing
Stream processing
Programming paradigm

UN Sustainable Development Goals

Industry, innovation and infrastructure

No related works found for this paper.