articleDec 1, 2011Closed access

Feature engineering in Context-Dependent Deep Neural Networks for conversational speech transcription

Microsoft Research Asia (China) · Tsinghua University · +1 more institution

Indexed incrossref

Abstract

We investigate the potential of Context-Dependent Deep-Neural-Network HMMs, or CD-DNN-HMMs, from a feature-engineering perspective. Recently, we had shown that for speaker-independent transcription of phone calls (NIST RT03S Fisher data), CD-DNN-HMMs reduced the word error rate by as much as one third-from 27.4%, obtained by discriminatively trained Gaussian-mixture HMMs with HLDA features, to 18.5%-using 300+ hours of training data (Switchboard), 9000+ tied triphone states, and up to 9 hidden network layers.

Citation impact

638
total citations
FWCI
79.12
Percentile
100%
References
25
Citations per year

Authors

4

Topics & keywords

Keywords
  • Speech recognition
  • Computer science
  • NIST
  • Transcription (linguistics)
  • Artificial neural network
  • Phone
  • Artificial intelligence
  • Feature (linguistics)
UN Sustainable Development Goals
  • Reduced inequalities
No related works found for this paper.

Funding