Joint CTC-attention based end-to-end speech recognition using multi-task learning

Kim, Suyoun; Hori, Takaaki; Watanabe, Shinji

doi:10.1109/icassp.2017.7953075

articleMar 1, 2017Closed access

Joint CTC-attention based end-to-end speech recognition using multi-task learning

SKSuyoun Kim THTakaaki Hori SWShinji Watanabe

Carnegie Mellon University · Mitsubishi Electric (Japan)

Indexed incrossref

Abstract

Recently, there has been an increasing interest in end-to-end speech recognition that directly transcribes speech to text without any predefined alignments. One approach is the attention-based encoder-decoder framework that learns a mapping between variable-length input and output sequences in one step using a purely data-driven method. The attention model has often been shown to improve the performance over another end-to-end approach, the Connectionist Temporal Classification (CTC), mainly because it explicitly uses the history of the target character without any conditional independence assumptions. However, we observed that the performance of the attention has shown poor results in noisy condition and is…

Citation impact

906

total citations

FWCI: 75.11
Percentile: 100%
References: 22

Citations per year

Authors

3

Topics & keywords

Topics

Keywords

Computer science
Robustness (evolution)
Speech recognition
End-to-end principle
Connectionism
Encoder
Artificial intelligence
Task (project management)

No related works found for this paper.