preprintDec 1, 2015Closed access
EESEN: End-to-end speech recognition using deep RNN models and WFST-based decoding
Indexed incrossref
Abstract
The performance of automatic speech recognition (ASR) has improved tremendously due to the application of deep neural networks (DNNs). Despite this progress, building a new ASR system remains a challenging task, requiring various resources, multiple training stages and significant expertise. This paper presents our Eesen framework which drastically simplifies the existing pipeline to build state-of-the-art ASR systems. Acoustic modeling in Eesen involves learning a single recurrent neural network (RNN) predicting context-independent targets (phonemes or characters). To remove the need for pre-generated frame labels, we adopt the connectionist temporal classification (CTC) objective function to infer the…
Citation impact
639
total citations
- FWCI
- 84.12
- Percentile
- 100%
- References
- 52
Citations per year
Authors
3Topics & keywords
Topics
Keywords
- Computer science
- Decoding methods
- Pipeline (software)
- Recurrent neural network
- Speech recognition
- Context (archaeology)
- Connectionism
- Artificial intelligence
No related works found for this paper.