Reining in the outliers in map-reduce clusters using Mantri
Microsoft (United States) · Berkeley College
Abstract
Experience froman operational Map-Reduce cluster reveals that outliers significantly prolong job completion. The causes for outliers include run-time contention for processor, memory and other resources, disk failures, varying bandwidth and congestion along network paths and, imbalance in task workload. We present Mantri, a system that monitors tasks and culls outliers using cause- and resource-aware techniques. Mantri's strategies include restarting outliers, network-aware placement of tasks and protecting outputs of valuable tasks. Using real-time progress reports, Mantri detects and acts on outliers early in their lifetime. Early action frees up resources that can be used by subsequent tasks and expedites…
Citation impact
- FWCI
- 101.20
- Percentile
- 100%
- References
- 36
Authors
7Topics & keywords
- Outlier
- Computer science
- Software deployment
- Workload
- Task (project management)
- Resource (disambiguation)
- Real-time computing
- Artificial intelligence
- Decent work and economic growth