Conversational speech transcription using context-dependent deep neural networks

Seide, Frank; Li, Gang; Yu, Dong

doi:10.21437/interspeech.2011-169

articleAug 27, 2011Closed access

Conversational speech transcription using context-dependent deep neural networks

FSFrank Seide GLGang Li DYDong Yu

Indexed incrossref

Abstract

We apply the recently proposed Context-Dependent Deep-Neural-Network HMMs, or CD-DNN-HMMs, to speech-to-text transcription. For single-pass speaker-independent recognition on the RT03S Fisher portion of phone-call transcription benchmark (Switchboard), the word-error rate is reduced from 27.4%, obtained by discriminatively trained Gaussian-mixture HMMs, to 18.5%—a 33 % relative improvement. CD-DNN-HMMs combine classic artificial-neural-network HMMs with traditional tied-state triphones and deep-beliefnetwork pre-training. They had previously been shown to reduce errors by 16 % relatively when trained on tens of hours of data using hundreds of tied states. This paper takes CD-DNN-HMMs further and applies them…

Citation impact

880

total citations

FWCI: 110.33
Percentile: 100%
References: 0

Citations per year

Authors

3

Topics & keywords

Topics

Speech Recognition and Synthesis80%

Keywords

Computer science
Transcription (linguistics)
Speech recognition
Artificial neural network
Context (archaeology)
Artificial intelligence
Natural language processing
Biology

No related works found for this paper.