Unsupervised Data Augmentation for Consistency Training

Xie, Qizhe; Dai, Zihang; Hovy, Eduard; Luong, Minh-Thang; Le, Quoc V.

doi:10.48550/arxiv.1904.12848

preprintarXiv (Cornell University)Apr 29, 2019GREEN OA

Unsupervised Data Augmentation for Consistency Training

QXQizhe Xie ZDZihang Dai EHEduard Hovy MLMinh-Thang Luong QVQuoc V. Le

Indexed inarxivdatacite

Abstract

Semi-supervised learning lately has shown much promise in improving deep learning models when labeled data is scarce. Common among recent approaches is the use of consistency training on a large amount of unlabeled data to constrain model predictions to be invariant to input noise. In this work, we present a new perspective on how to effectively noise unlabeled examples and argue that the quality of noising, specifically those produced by advanced data augmentation methods, plays a crucial role in semi-supervised learning. By substituting simple noising operations with advanced data augmentation methods such as RandAugment and back-translation, our method brings substantial improvements across six language and…

Citation impact

1,621

total citations

FWCI: —
Percentile: —
References: 74

Citations per year

Authors

5

Topics & keywords

Topics

Keywords

Computer science
Labeled data
Consistency (knowledge bases)
Artificial intelligence
Machine learning
Benchmark (surveying)
Code (set theory)
Noisy data

UN Sustainable Development Goals

Quality Education

No related works found for this paper.