EESEN: End-to-end speech recognition using deep RNN models and WFST-based decoding

Miao, Yajie; Gowayyed, Mohammad; Metze, Florian

doi:10.1109/asru.2015.7404790

preprintDec 1, 2015Closed access

EESEN: End-to-end speech recognition using deep RNN models and WFST-based decoding

YMYajie Miao MGMohammad Gowayyed FMFlorian Metze

Carnegie Mellon University

Indexed incrossref

Abstract

The performance of automatic speech recognition (ASR) has improved tremendously due to the application of deep neural networks (DNNs). Despite this progress, building a new ASR system remains a challenging task, requiring various resources, multiple training stages and significant expertise. This paper presents our Eesen framework which drastically simplifies the existing pipeline to build state-of-the-art ASR systems. Acoustic modeling in Eesen involves learning a single recurrent neural network (RNN) predicting context-independent targets (phonemes or characters). To remove the need for pre-generated frame labels, we adopt the connectionist temporal classification (CTC) objective function to infer the…

Citation impact

639

total citations

FWCI: 84.12
Percentile: 100%
References: 52

Citations per year

Authors

3

Topics & keywords

Topics

Keywords

Computer science
Decoding methods
Pipeline (software)
Recurrent neural network
Speech recognition
Context (archaeology)
Connectionism
Artificial intelligence

No related works found for this paper.

Funding

NS
National Science Foundation