articleProceedings of the VLDB EndowmentNov 1, 2017GREEN OA

Snorkel

Stanford University

PubMed
Indexed inarxivcrossrefpubmed

Abstract

Labeling training data is increasingly the largest bottleneck in deploying machine learning systems. We present Snorkel, a first-of-its-kind system that enables users to train state-of- the-art models without hand labeling any training data. Instead, users write labeling functions that express arbitrary heuristics, which can have unknown accuracies and correlations. Snorkel denoises their outputs without access to ground truth by incorporating the first end-to-end implementation of our recently proposed machine learning paradigm, data programming. We present a flexible interface layer for writing labeling functions based on our experience over the past year collaborating with companies, agencies, and research…

Citation impact

749
total citations
FWCI
59.10
Percentile
100%
References
85
Citations per year

Authors

6

Topics & keywords

Keywords
  • Computer science
  • Pipeline (software)
  • Heuristics
  • Bottleneck
  • Machine learning
  • Artificial intelligence
  • Heuristic
  • Ground truth
No related works found for this paper.