Abstract
Large-scale data analytics frameworks are shifting towards shorter task durations and larger degrees of parallelism to provide low latency. Scheduling highly parallel jobs that complete in hundreds of milliseconds poses a major challenge for task schedulers, which will need to schedule millions of tasks per second on appropriate machines while offering millisecond-level latency and high availability. We demonstrate that a decentralized, randomized sampling approach provides near-optimal performance while avoiding the throughput and availability limitations of a centralized design. We implement and deploy our scheduler, Sparrow, on a 110-machine cluster and demonstrate that Sparrow performs within 12% of an…
Citation impact
581
total citations
- FWCI
- 127.01
- Percentile
- 100%
- References
- 27
Citations per year
Authors
4Topics & keywords
Topics
Keywords
- Computer science
- Scheduling (production processes)
- Latency (audio)
- Distributed computing
- Sparrow
- Millisecond
- Schedule
- Parallel computing
UN Sustainable Development Goals
- Industry, innovation and infrastructure
No related works found for this paper.
Funding
- NSNational Science FoundationAwards: 1139158, ISTC-CC, CCF-1139158
- UDU.S. Department of Defense
- ICIntel Corporation
- GEGeneral Electric
- MMicrosoft
- CSCisco Systems
- OOracle
- SNSAP North America
- FFacebook
- HFHertz Foundation
- GGoogle
- AWAmazon Web Services
- NNetApp
- VVMware
- ISInternational Science and Technology Center
- HTHuawei Technologies
- DFDirectorate for Computer and Information Science and EngineeringAward: CCF-1139158
- DADefense Advanced Research Projects AgencyAwards: A8750-12-2-0331, FA8750, XData Award FA8750-12-2-0331, FA8750-12-2-0331
- SSamsung
- DODivision of Computing and Communication FoundationsAward: CCF-1139158