A study on data augmentation of reverberant speech for robust speech recognition
Huawei Technologies (China) · Institute for Language and Speech Processing · +2 more institutions
Abstract
The environmental robustness of DNN-based acoustic models can be significantly improved by using multi-condition training data. However, as data collection is a costly proposition, simulation of the desired conditions is a frequently adopted strategy. In this paper we detail a data augmentation approach for far-field ASR. We examine the impact of using simulated room impulse responses (RIRs), as real RIRs can be difficult to acquire, and also the effect of adding point-source noises. We find that the performance gap between using simulated and real RIRs can be eliminated when point-source noises are added. Further we show that the trained acoustic models not only perform well in the distant-talking scenario…
Citation impact
- FWCI
- 45.73
- Percentile
- 100%
- References
- 23
Authors
5- TKTom KoCorresponding
Huawei Technologies (China)
- VPVijayaditya Peddinti
Institute for Language and Speech Processing
- DPDaniel Povey
Institute for Language and Speech Processing, Johns Hopkins University
- MLMichael L. Seltzer
Microsoft (United States)
- SKSanjeev Khudanpur
Johns Hopkins University, Institute for Language and Speech Processing
Topics & keywords
- Robustness (evolution)
- Computer science
- Speech recognition
- Acoustic model
- Impulse response
- Training set
- Impulse (physics)
- Field (mathematics)
- Life in Land