articleAug 24, 2008Closed access
Get another label? improving data quality and data mining using multiple, noisy labelers
Indexed incrossref
Abstract
This paper addresses the repeated acquisition of labels for data items when the labeling is imperfect. We examine the improvement (or lack thereof) in data quality via repeated labeling, and focus especially on the improvement of training labels for supervised induction. With the outsourcing of small tasks becoming easier, for example via Rent-A-Coder or Amazon's Mechanical Turk, it often is possible to obtain less-than-expert labeling at low cost. With low-cost labeling, preparing the unlabeled part of the data can become considerably more expensive than labeling. We present repeated-labeling strategies of increasing complexity, and show several main results. (i) Repeated-labeling can improve label quality…
Citation impact
1,111
total citations
- FWCI
- 81.65
- Percentile
- 100%
- References
- 47
Citations per year
Authors
3Topics & keywords
Topics
Keywords
- Computer science
- Quality (philosophy)
- Sequence labeling
- Set (abstract data type)
- Crowdsourcing
- Artificial intelligence
- Imperfect
- Focus (optics)
No related works found for this paper.