Get another label? improving data quality and data mining using multiple, noisy labelers

Sheng, Victor S.; Provost, Foster; Ipeirotis, Panagiotis G.

doi:10.1145/1401890.1401965

articleAug 24, 2008Closed access

Get another label? improving data quality and data mining using multiple, noisy labelers

VSVictor S. Sheng FPFoster Provost PGPanagiotis G. Ipeirotis

New York University

Indexed incrossref

Abstract

This paper addresses the repeated acquisition of labels for data items when the labeling is imperfect. We examine the improvement (or lack thereof) in data quality via repeated labeling, and focus especially on the improvement of training labels for supervised induction. With the outsourcing of small tasks becoming easier, for example via Rent-A-Coder or Amazon's Mechanical Turk, it often is possible to obtain less-than-expert labeling at low cost. With low-cost labeling, preparing the unlabeled part of the data can become considerably more expensive than labeling. We present repeated-labeling strategies of increasing complexity, and show several main results. (i) Repeated-labeling can improve label quality…

Citation impact

1,111

total citations

FWCI: 81.65
Percentile: 100%
References: 47

Citations per year

Authors

3

Topics & keywords

Topics

Keywords

Computer science
Quality (philosophy)
Sequence labeling
Set (abstract data type)
Crowdsourcing
Artificial intelligence
Imperfect
Focus (optics)

No related works found for this paper.