Towards End-To-End Speech Recognition with Recurrent Neural Networks

Graves, Alex; Jaitly, Navdeep

articleJun 21, 2014Closed access

Towards End-To-End Speech Recognition with Recurrent Neural Networks

Google (United Kingdom) · DeepMind (United Kingdom) · +1 more institution

Abstract

This paper presents a speech recognition sys-tem that directly transcribes audio data with text, without requiring an intermediate phonetic repre-sentation. The system is based on a combination of the deep bidirectional LSTM recurrent neural network architecture and the Connectionist Tem-poral Classification objective function. A mod-ification to the objective function is introduced that trains the network to minimise the expec-tation of an arbitrary transcription loss function. This allows a direct optimisation of the word er-ror rate, even in the absence of a lexicon or lan-guage model. The system achieves a word error rate of 27.3 % on the Wall Street Journal corpus with no prior linguistic information,…

Citation impact

1,853

total citations

FWCI: 189.91
Percentile: 100%
References: 23

Citations per year

Authors

2

Topics & keywords

Topics

Keywords

Computer science
Word error rate
Trigram
Speech recognition
Connectionism
Language model
Recurrent neural network
Artificial intelligence

UN Sustainable Development Goals

Peace, Justice and strong institutions

No related works found for this paper.