Apache Hadoop YARN

Vavilapalli, Vinod Kumar; Murthy, Arun C.; Douglas, Chris; Agarwal, Sharad; Konar, Mahadev; Evans, Robert; Graves, Thomas; Lowe, Jason; Shah, Hitesh; Seth, Siddharth; Saha, Bikas; Curino, Carlo; O'Malley, Owen; Radia, Sanjay; Reed, Benjamin; Baldeschwieler, Eric

doi:10.1145/2523616.2523633

articleOct 1, 2013Closed access

Apache Hadoop YARN

VKVinod Kumar Vavilapalli ACArun C. Murthy CDChris Douglas SASharad Agarwal MKMahadev Konar

Hortonworks (United States) · Microsoft Research (United Kingdom) · +3 more institutions

Indexed incrossref

Abstract

The initial design of Apache Hadoop [1] was tightly focused on running massive, MapReduce jobs to process a web crawl. For increasingly diverse companies, Hadoop has become the data and computational agorá---the de facto place where data and computational resources are shared and accessed. This broad adoption and ubiquitous usage has stretched the initial design well beyond its intended target, exposing two key shortcomings: 1) tight coupling of a specific programming model with the resource management infrastructure, forcing developers to abuse the MapReduce programming model, and 2) centralized handling of jobs' control flow, which resulted in endless scalability concerns for the scheduler.

Citation impact

1,836

total citations

FWCI: 411.36
Percentile: 100%
References: 29

Citations per year

Authors

16

Topics & keywords

Topics

Keywords

Computer science
Scalability
Yarn
Big data
Programming paradigm
Distributed computing
Forcing (mathematics)
Database

UN Sustainable Development Goals

Industry, innovation and infrastructure

No related works found for this paper.