articleOperating Systems Design and ImplementationOct 4, 2010Closed access

Reining in the outliers in map-reduce clusters using Mantri

Microsoft (United States) · Berkeley College

Abstract

Experience froman operational Map-Reduce cluster reveals that outliers significantly prolong job completion. The causes for outliers include run-time contention for processor, memory and other resources, disk failures, varying bandwidth and congestion along network paths and, imbalance in task workload. We present Mantri, a system that monitors tasks and culls outliers using cause- and resource-aware techniques. Mantri's strategies include restarting outliers, network-aware placement of tasks and protecting outputs of valuable tasks. Using real-time progress reports, Mantri detects and acts on outliers early in their lifetime. Early action frees up resources that can be used by subsequent tasks and expedites…

Citation impact

653
total citations
FWCI
101.20
Percentile
100%
References
36
Citations per year

Authors

7

Topics & keywords

Keywords
  • Outlier
  • Computer science
  • Software deployment
  • Workload
  • Task (project management)
  • Resource (disambiguation)
  • Real-time computing
  • Artificial intelligence
UN Sustainable Development Goals
  • Decent work and economic growth
No related works found for this paper.