articleSep 6, 2015Closed access
Audio augmentation for speech recognition
Indexed incrossref
Abstract
Data augmentation is a common strategy adopted to increase the quantity of training data, avoid overfitting and improve robustness of the models. In this paper, we investigate audio-level speech augmentation methods which directly process the raw signal. The method we particularly recommend is to change the speed of the audio signal, producing 3 versions of the original signal with speed factors of 0.9, 1.0 and 1.1. The proposed technique has a low implementation cost, making it easy to adopt. We present results on 4 different LVCSR tasks with training data ranging from 100 hours to 1000 hours, to examine the effectiveness of audio augmentation in a variety of data scenarios. An average relative improvement of…
Citation impact
1,144
total citations
- FWCI
- 38.25
- Percentile
- 100%
- References
- 15
Citations per year
Authors
4Topics & keywords
Topics
Keywords
- Speech recognition
- Computer science
- Audio mining
- Speech processing
- Acoustic model
No related works found for this paper.