articleIEEE Journal of Selected Topics in Signal ProcessingOct 25, 2017Closed access

Hybrid CTC/Attention Architecture for End-to-End Speech Recognition

Mitsubishi Electric (United States) · Carnegie Mellon University · +1 more institution

Indexed incrossref

Abstract

Conventional automatic speech recognition (ASR) based on a hidden Markov model (HMM)/deep neural network (DNN) is a very complicated system consisting of various modules such as acoustic, lexicon, and language models. It also requires linguistic resources, such as a pronunciation dictionary, tokenization, and phonetic context-dependency trees. On the other hand, end-to-end ASR has become a popular alternative to greatly simplify the model-building process of conventional ASR systems by representing complicated modules with a single deep network architecture, and by replacing the use of linguistic resources with a data-driven learning method. There are two major types of end-to-end architectures for ASR;…

Citation impact

828
total citations
FWCI
48.71
Percentile
100%
References
59
Citations per year

Authors

5

Topics & keywords

Keywords
  • Computer science
  • Speech recognition
  • Hidden Markov model
  • Decoding methods
  • End-to-end principle
  • Artificial intelligence
  • Robustness (evolution)
UN Sustainable Development Goals
  • Quality Education
No related works found for this paper.

Funding