articleIEEE Transactions on Knowledge and Data EngineeringOct 8, 2019Closed access

A Survey on Data Collection for Machine Learning: A Big Data - AI Integration Perspective

Korea Advanced Institute of Science and Technology

Indexed incrossref

Abstract

Data collection is a major bottleneck in machine learning and an active research topic in multiple communities. There are largely two reasons data collection has recently become a critical issue. First, as machine learning is becoming more widely-used, we are seeing new applications that do not necessarily have enough labeled data. Second, unlike traditional machine learning, deep learning techniques automatically generate features, which saves feature engineering costs, but in return may require larger amounts of labeled data. Interestingly, recent research in data collection comes not only from the machine learning, natural language, and computer vision communities, but also from the data management…

Citation impact

873
total citations
FWCI
51.68
Percentile
100%
References
237
Citations per year

Authors

3

Topics & keywords

Keywords
  • Computer science
  • Data collection
  • Big data
  • Data science
  • Machine learning
  • Artificial intelligence
  • Bottleneck
  • Data integration
No related works found for this paper.

Funding