Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition

Dahl, George E.; Yu, Dong; Deng, Li; Acero, Alex

doi:10.1109/tasl.2011.2134090

articleIEEE Transactions on Audio Speech and Language ProcessingApr 6, 2011Closed access

Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition

GEGeorge E. Dahl DYDong Yu LDLi Deng AAAlex Acero

University of Toronto · Microsoft (United States)

Indexed incrossref

Abstract

We propose a novel context-dependent (CD) model for large-vocabulary speech recognition (LVSR) that leverages recent advances in using deep belief networks for phone recognition. We describe a pre-trained deep neural network hidden Markov model (DNN-HMM) hybrid architecture that trains the DNN to produce a distribution over senones (tied triphone states) as its output. The deep belief network pre-training algorithm is a robust and often helpful way to initialize deep neural networks generatively that can aid in optimization and reduce generalization error. We illustrate the key components of our model, describe the procedure for applying CD-DNN-HMMs to LVSR, and analyze the effects of various modeling choices…

Citation impact

3,075

total citations

FWCI: 204.83
Percentile: 100%
References: 109

Citations per year

Authors

4

Topics & keywords

Topics

Keywords

Hidden Markov model
Computer science
Speech recognition
Word error rate
Artificial intelligence
Artificial neural network
Context (archaeology)
Mixture model

UN Sustainable Development Goals

Quality Education

No related works found for this paper.