Ideal ratio mask estimation using deep neural networks for robust speech recognition

Narayanan, Arun; Wang, DeLiang

doi:10.1109/icassp.2013.6639038

articleMay 1, 2013Closed access

Ideal ratio mask estimation using deep neural networks for robust speech recognition

ANArun Narayanan DWDeLiang Wang

The Ohio State University

Indexed incrossref

Abstract

We propose a feature enhancement algorithm to improve robust automatic speech recognition (ASR). The algorithm estimates a smoothed ideal ratio mask (IRM) in the Mel frequency domain using deep neural networks and a set of time-frequency unit level features that has previously been used to estimate the ideal binary mask. The estimated IRM is used to filter out noise from a noisy Mel spectrogram before performing cepstral feature extraction for ASR. On the noisy subset of the Aurora-4 robust ASR corpus, the proposed enhancement obtains a relative improvement of over 38% in terms of word error rates using ASR models trained in clean conditions, and an improvement of over 14% when the models are trained using the…

Citation impact

553

total citations

FWCI: 28.98
Percentile: 100%
References: 27

Citations per year

Authors

2

Topics & keywords

Topics

Keywords

Spectrogram
Computer science
Speech recognition
Artificial intelligence
Pattern recognition (psychology)
Cepstrum
Mel-frequency cepstrum
Feature extraction

UN Sustainable Development Goals

Peace, Justice and strong institutions

No related works found for this paper.