Abstract
Many "big data" applications must act on data in real time. Running these applications at ever-larger scales requires parallel platforms that automatically handle faults and stragglers. Unfortunately, current distributed stream processing models provide fault recovery in an expensive manner, requiring hot replication or long recovery times, and do not handle stragglers. We propose a new processing model, discretized streams (D-Streams), that overcomes these challenges. D-Streams enable a parallel recovery mechanism that improves efficiency over traditional replication and backup schemes, and tolerates stragglers. We show that they support a rich set of operators while attaining high per-node throughput similar…
Citation impact
961
total citations
- FWCI
- 231.60
- Percentile
- 100%
- References
- 39
Citations per year
Authors
6Topics & keywords
Topics
Keywords
- Computer science
- Stream processing
- Backup
- Data stream mining
- Distributed computing
- Throughput
- Fault tolerance
- Big data
No related works found for this paper.
Funding
- NSNational Science FoundationAwards: 1139158, CCF-1139158
- ICIntel Corporation
- GEGeneral Electric
- MMicrosoft
- CSCisco Systems
- OOracle
- SNSAP North America
- FFacebook
- GGoogle
- AWAmazon Web Services
- NNetApp
- VVMware
- HTHuawei Technologies
- DFDirectorate for Computer and Information Science and EngineeringAward: CCF-1139158
- DADefense Advanced Research Projects AgencyAwards: FA8750, XData Award FA8750-12-2-0331, FA8750-12-2-0331
- SSamsung
- DODivision of Computing and Communication FoundationsAward: CCF-1139158