Audio augmentation for speech recognition

Ko, Tom; Peddinti, Vijayaditya; Povey, Daniel; Khudanpur, Sanjeev

doi:10.21437/interspeech.2015-711

articleSep 6, 2015Closed access

Audio augmentation for speech recognition

TKTom Ko VPVijayaditya Peddinti DPDaniel Povey SKSanjeev Khudanpur

Johns Hopkins University

Indexed incrossref

Abstract

Data augmentation is a common strategy adopted to increase the quantity of training data, avoid overfitting and improve robustness of the models. In this paper, we investigate audio-level speech augmentation methods which directly process the raw signal. The method we particularly recommend is to change the speed of the audio signal, producing 3 versions of the original signal with speed factors of 0.9, 1.0 and 1.1. The proposed technique has a low implementation cost, making it easy to adopt. We present results on 4 different LVCSR tasks with training data ranging from 100 hours to 1000 hours, to examine the effectiveness of audio augmentation in a variety of data scenarios. An average relative improvement of…

Citation impact

1,144

total citations

FWCI: 38.25
Percentile: 100%
References: 15

Citations per year

Authors

4

Topics & keywords

Topics

Keywords

Speech recognition
Computer science
Audio mining
Speech processing
Acoustic model

No related works found for this paper.