Towards End-To-End Speech Recognition with Recurrent Neural Networks
Google (United Kingdom) · DeepMind (United Kingdom) · +1 more institution
Abstract
This paper presents a speech recognition sys-tem that directly transcribes audio data with text, without requiring an intermediate phonetic repre-sentation. The system is based on a combination of the deep bidirectional LSTM recurrent neural network architecture and the Connectionist Tem-poral Classification objective function. A mod-ification to the objective function is introduced that trains the network to minimise the expec-tation of an arbitrary transcription loss function. This allows a direct optimisation of the word er-ror rate, even in the absence of a lexicon or lan-guage model. The system achieves a word error rate of 27.3 % on the Wall Street Journal corpus with no prior linguistic information,…
Citation impact
- FWCI
- 189.91
- Percentile
- 100%
- References
- 23
Authors
2Topics & keywords
- Computer science
- Word error rate
- Trigram
- Speech recognition
- Connectionism
- Language model
- Recurrent neural network
- Artificial intelligence
- Peace, Justice and strong institutions