Listen, attend and spell: A neural network for large vocabulary conversational speech recognition

Chan, William; Jaitly, Navdeep; Le, Quoc V.; Vinyals, Oriol

doi:10.1109/icassp.2016.7472621

articleMar 1, 2016Closed access

Listen, attend and spell: A neural network for large vocabulary conversational speech recognition

WCWilliam Chan NJNavdeep Jaitly QVQuoc V. Le OVOriol Vinyals

Carnegie Mellon University · Google (United States)

Indexed incrossref

Abstract

We present Listen, Attend and Spell (LAS), a neural speech recognizer that transcribes speech utterances directly to characters without pronunciation models, HMMs or other components of traditional speech recognizers. In LAS, the neural network architecture subsumes the acoustic, pronunciation and language models making it not only an end-to-end trained system but an end-to-end model. In contrast to DNN-HMM, CTC and most other models, LAS makes no independence assumptions about the probability distribution of the output character sequences given the acoustic sequence. Our system has two components: a listener and a speller. The listener is a pyramidal recurrent network encoder that accepts filter bank spectra…

Citation impact

2,339

total citations

FWCI: 291.09
Percentile: 100%
References: 48

Citations per year

Authors

4

Topics & keywords

Topics

Keywords

Computer science
Speech recognition
Pronunciation
Hidden Markov model
Language model
Artificial neural network
Recurrent neural network
Character (mathematics)

UN Sustainable Development Goals

Quality Education

No related works found for this paper.