preprintDec 1, 2015Closed access

EESEN: End-to-end speech recognition using deep RNN models and WFST-based decoding

Carnegie Mellon University

Indexed incrossref

Abstract

The performance of automatic speech recognition (ASR) has improved tremendously due to the application of deep neural networks (DNNs). Despite this progress, building a new ASR system remains a challenging task, requiring various resources, multiple training stages and significant expertise. This paper presents our Eesen framework which drastically simplifies the existing pipeline to build state-of-the-art ASR systems. Acoustic modeling in Eesen involves learning a single recurrent neural network (RNN) predicting context-independent targets (phonemes or characters). To remove the need for pre-generated frame labels, we adopt the connectionist temporal classification (CTC) objective function to infer the…

Citation impact

639
total citations
FWCI
84.12
Percentile
100%
References
52
Citations per year

Authors

3

Topics & keywords

Keywords
  • Computer science
  • Decoding methods
  • Pipeline (software)
  • Recurrent neural network
  • Speech recognition
  • Context (archaeology)
  • Connectionism
  • Artificial intelligence
No related works found for this paper.

Funding