articleAug 24, 2008Closed access

Get another label? improving data quality and data mining using multiple, noisy labelers

New York University

Indexed incrossref

Abstract

This paper addresses the repeated acquisition of labels for data items when the labeling is imperfect. We examine the improvement (or lack thereof) in data quality via repeated labeling, and focus especially on the improvement of training labels for supervised induction. With the outsourcing of small tasks becoming easier, for example via Rent-A-Coder or Amazon's Mechanical Turk, it often is possible to obtain less-than-expert labeling at low cost. With low-cost labeling, preparing the unlabeled part of the data can become considerably more expensive than labeling. We present repeated-labeling strategies of increasing complexity, and show several main results. (i) Repeated-labeling can improve label quality…

Citation impact

1,111
total citations
FWCI
81.65
Percentile
100%
References
47
Citations per year

Authors

3

Topics & keywords

Keywords
  • Computer science
  • Quality (philosophy)
  • Sequence labeling
  • Set (abstract data type)
  • Crowdsourcing
  • Artificial intelligence
  • Imperfect
  • Focus (optics)
No related works found for this paper.

Funding