articleMay 1, 2013Closed access

Ideal ratio mask estimation using deep neural networks for robust speech recognition

The Ohio State University

Indexed incrossref

Abstract

We propose a feature enhancement algorithm to improve robust automatic speech recognition (ASR). The algorithm estimates a smoothed ideal ratio mask (IRM) in the Mel frequency domain using deep neural networks and a set of time-frequency unit level features that has previously been used to estimate the ideal binary mask. The estimated IRM is used to filter out noise from a noisy Mel spectrogram before performing cepstral feature extraction for ASR. On the noisy subset of the Aurora-4 robust ASR corpus, the proposed enhancement obtains a relative improvement of over 38% in terms of word error rates using ASR models trained in clean conditions, and an improvement of over 14% when the models are trained using the…

Citation impact

553
total citations
FWCI
28.98
Percentile
100%
References
27
Citations per year

Authors

2

Topics & keywords

Keywords
  • Spectrogram
  • Computer science
  • Speech recognition
  • Artificial intelligence
  • Pattern recognition (psychology)
  • Cepstrum
  • Mel-frequency cepstrum
  • Feature extraction
UN Sustainable Development Goals
  • Peace, Justice and strong institutions
No related works found for this paper.