BlinkDB
University of California, Berkeley · Massachusetts Institute of Technology
Abstract
In this paper, we present BlinkDB, a massively parallel, approximate query engine for running interactive SQL queries on large volumes of data. BlinkDB allows users to trade-off query accuracy for response time, enabling interactive queries over massive data by running queries on data samples and presenting results annotated with meaningful error bars. To achieve this, BlinkDB uses two key ideas: (1) an adaptive optimization framework that builds and maintains a set of multi-dimensional stratified samples from original data over time, and (2) a dynamic sample selection strategy that selects an appropriately sized sample based on a query's accuracy or response time requirements. We evaluate BlinkDB against the…
Citation impact
- FWCI
- 67.29
- Percentile
- 100%
- References
- 40
Authors
6Topics & keywords
- Computer science
- SQL
- Workload
- Node (physics)
- Set (abstract data type)
- Sample (material)
- Key (lock)
- Query optimization
Funding
- NSNational Science Foundation
- ICIntel Corporation
- GEGeneral Electric
- MMicrosoft
- CSCisco Systems
- OOracle
- SNSAP North America
- QQualcomm
- FFacebook
- GGoogle
- UOUniversity of California Berkeley
- AWAmazon Web Services
- NNetApp
- VVMware
- HTHuawei Technologies
- DFDirectorate for Computer and Information Science and Engineering
- DADefense Advanced Research Projects AgencyAward: FA8750-12-2-0331
- SSamsung
- DODivision of Computing and Communication FoundationsAward: CCF-1139158