Hybrid CTC/Attention Architecture for End-to-End Speech Recognition
Mitsubishi Electric (United States) · Carnegie Mellon University · +1 more institution
Abstract
Conventional automatic speech recognition (ASR) based on a hidden Markov model (HMM)/deep neural network (DNN) is a very complicated system consisting of various modules such as acoustic, lexicon, and language models. It also requires linguistic resources, such as a pronunciation dictionary, tokenization, and phonetic context-dependency trees. On the other hand, end-to-end ASR has become a popular alternative to greatly simplify the model-building process of conventional ASR systems by representing complicated modules with a single deep network architecture, and by replacing the use of linguistic resources with a data-driven learning method. There are two major types of end-to-end architectures for ASR;…
Citation impact
- FWCI
- 48.71
- Percentile
- 100%
- References
- 59
Authors
5Topics & keywords
- Computer science
- Speech recognition
- Hidden Markov model
- Decoding methods
- End-to-end principle
- Artificial intelligence
- Robustness (evolution)
- Quality Education