preprintarXiv (Cornell University)Apr 29, 2019GREEN OA

Unsupervised Data Augmentation for Consistency Training

Indexed inarxivdatacite

Abstract

Semi-supervised learning lately has shown much promise in improving deep learning models when labeled data is scarce. Common among recent approaches is the use of consistency training on a large amount of unlabeled data to constrain model predictions to be invariant to input noise. In this work, we present a new perspective on how to effectively noise unlabeled examples and argue that the quality of noising, specifically those produced by advanced data augmentation methods, plays a crucial role in semi-supervised learning. By substituting simple noising operations with advanced data augmentation methods such as RandAugment and back-translation, our method brings substantial improvements across six language and…

Citation impact

1,621
total citations
FWCI
Percentile
References
74
Citations per year

Authors

5

Topics & keywords

Keywords
  • Computer science
  • Labeled data
  • Consistency (knowledge bases)
  • Artificial intelligence
  • Machine learning
  • Benchmark (surveying)
  • Code (set theory)
  • Noisy data
UN Sustainable Development Goals
  • Quality Education
No related works found for this paper.