Hybrid CTC/Attention Architecture for End-to-End Speech Recognition

Watanabe, Shinji; Hori, Takaaki; Kim, Suyoun; Hershey, John R.; Hayashi, Tomoki

doi:10.1109/jstsp.2017.2763455

articleIEEE Journal of Selected Topics in Signal ProcessingOct 25, 2017Closed access

Hybrid CTC/Attention Architecture for End-to-End Speech Recognition

SWShinji Watanabe THTakaaki Hori SKSuyoun Kim JRJohn R. Hershey THTomoki Hayashi

Mitsubishi Electric (United States) · Carnegie Mellon University · +1 more institution

Indexed incrossref

Abstract

Conventional automatic speech recognition (ASR) based on a hidden Markov model (HMM)/deep neural network (DNN) is a very complicated system consisting of various modules such as acoustic, lexicon, and language models. It also requires linguistic resources, such as a pronunciation dictionary, tokenization, and phonetic context-dependency trees. On the other hand, end-to-end ASR has become a popular alternative to greatly simplify the model-building process of conventional ASR systems by representing complicated modules with a single deep network architecture, and by replacing the use of linguistic resources with a data-driven learning method. There are two major types of end-to-end architectures for ASR;…

Citation impact

828

total citations

FWCI: 48.71
Percentile: 100%
References: 59

Citations per year

Authors

5

Topics & keywords

Topics

Keywords

Computer science
Speech recognition
Hidden Markov model
Decoding methods
End-to-end principle
Artificial intelligence
Robustness (evolution)

UN Sustainable Development Goals

Quality Education

No related works found for this paper.

Funding

HE
H2020 European Research Council