SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition

Park, Daniel; Chan, William; Zhang, Yu; Chiu, Chung‐Cheng; Zoph, Barret; Cubuk, Ekin D.; Le, Quoc V.

doi:10.21437/interspeech.2019-2680

articleSep 13, 2019GREEN OA

SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition

DPDaniel Park WCWilliam Chan YZYu Zhang CCChung‐Cheng Chiu BZBarret Zoph

Google (United States)

Indexed inarxivcrossref

Abstract

We present SpecAugment, a simple data augmentation method for speech recognition. SpecAugment is applied directly to the feature inputs of a neural network (i.e., filter bank coefficients). The augmentation policy consists of warping the features, masking blocks of frequency channels, and masking blocks of time steps. We apply SpecAugment on Listen, Attend and Spell networks for end-to-end speech recognition tasks. We achieve state-of-the-art performance on the LibriSpeech 960h and Swichboard 300h tasks, outperforming all prior work. On LibriSpeech, we achieve 6.8% WER on test-other without the use of a language model, and 5.8% WER with shallow fusion with a language model. This compares to the previous…

Citation impact

3,511

total citations

FWCI: 282.20
Percentile: 100%
References: 53

Citations per year

Authors

7

Topics & keywords

Topics

Keywords

Speech recognition
Computer science
Language model
Masking (illustration)
Spell
Set (abstract data type)
Feature (linguistics)
Artificial neural network

No related works found for this paper.