A Survey on Data Collection for Machine Learning: A Big Data - AI Integration Perspective
Korea Advanced Institute of Science and Technology
Abstract
Data collection is a major bottleneck in machine learning and an active research topic in multiple communities. There are largely two reasons data collection has recently become a critical issue. First, as machine learning is becoming more widely-used, we are seeing new applications that do not necessarily have enough labeled data. Second, unlike traditional machine learning, deep learning techniques automatically generate features, which saves feature engineering costs, but in return may require larger amounts of labeled data. Interestingly, recent research in data collection comes not only from the machine learning, natural language, and computer vision communities, but also from the data management…
Citation impact
- FWCI
- 51.68
- Percentile
- 100%
- References
- 237
Authors
3Topics & keywords
- Computer science
- Data collection
- Big data
- Data science
- Machine learning
- Artificial intelligence
- Bottleneck
- Data integration