articleJun 21, 2014Closed access

Towards End-To-End Speech Recognition with Recurrent Neural Networks

Google (United Kingdom) · DeepMind (United Kingdom) · +1 more institution

Abstract

This paper presents a speech recognition sys-tem that directly transcribes audio data with text, without requiring an intermediate phonetic repre-sentation. The system is based on a combination of the deep bidirectional LSTM recurrent neural network architecture and the Connectionist Tem-poral Classification objective function. A mod-ification to the objective function is introduced that trains the network to minimise the expec-tation of an arbitrary transcription loss function. This allows a direct optimisation of the word er-ror rate, even in the absence of a lexicon or lan-guage model. The system achieves a word error rate of 27.3 % on the Wall Street Journal corpus with no prior linguistic information,…

Citation impact

1,853
total citations
FWCI
189.91
Percentile
100%
References
23
Citations per year

Authors

2

Topics & keywords

Keywords
  • Computer science
  • Word error rate
  • Trigram
  • Speech recognition
  • Connectionism
  • Language model
  • Recurrent neural network
  • Artificial intelligence
UN Sustainable Development Goals
  • Peace, Justice and strong institutions
No related works found for this paper.