articleMay 1, 2013Closed access
Ideal ratio mask estimation using deep neural networks for robust speech recognition
Indexed incrossref
Abstract
We propose a feature enhancement algorithm to improve robust automatic speech recognition (ASR). The algorithm estimates a smoothed ideal ratio mask (IRM) in the Mel frequency domain using deep neural networks and a set of time-frequency unit level features that has previously been used to estimate the ideal binary mask. The estimated IRM is used to filter out noise from a noisy Mel spectrogram before performing cepstral feature extraction for ASR. On the noisy subset of the Aurora-4 robust ASR corpus, the proposed enhancement obtains a relative improvement of over 38% in terms of word error rates using ASR models trained in clean conditions, and an improvement of over 14% when the models are trained using the…
Citation impact
553
total citations
- FWCI
- 28.98
- Percentile
- 100%
- References
- 27
Citations per year
Authors
2Topics & keywords
Topics
Keywords
- Spectrogram
- Computer science
- Speech recognition
- Artificial intelligence
- Pattern recognition (psychology)
- Cepstrum
- Mel-frequency cepstrum
- Feature extraction
UN Sustainable Development Goals
- Peace, Justice and strong institutions
No related works found for this paper.