Feature engineering in Context-Dependent Deep Neural Networks for conversational speech transcription

Seide, Frank; Li, Gang; Xie, Chen; Yu, Dong

doi:10.1109/asru.2011.6163899

articleDec 1, 2011Closed access

Feature engineering in Context-Dependent Deep Neural Networks for conversational speech transcription

FSFrank Seide GLGang Li CXChen Xie DYDong Yu

Microsoft Research Asia (China) · Tsinghua University · +1 more institution

Indexed incrossref

Abstract

We investigate the potential of Context-Dependent Deep-Neural-Network HMMs, or CD-DNN-HMMs, from a feature-engineering perspective. Recently, we had shown that for speaker-independent transcription of phone calls (NIST RT03S Fisher data), CD-DNN-HMMs reduced the word error rate by as much as one third-from 27.4%, obtained by discriminatively trained Gaussian-mixture HMMs with HLDA features, to 18.5%-using 300+ hours of training data (Switchboard), 9000+ tied triphone states, and up to 9 hidden network layers.

Citation impact

638

total citations

FWCI: 79.12
Percentile: 100%
References: 25

Citations per year

Authors

4

Topics & keywords

Topics

Keywords

Speech recognition
Computer science
NIST
Transcription (linguistics)
Artificial neural network
Phone
Artificial intelligence
Feature (linguistics)

UN Sustainable Development Goals

Reduced inequalities

No related works found for this paper.

Funding

NI
National Institute of Standards and Technology