End-to-End Speech Recognition: A Survey
Google (United States) · Apple (United States) · +4 more institutions
Abstract
In the last decade of automatic speech recognition (ASR) research, the introduction of deep learning has brought considerable reductions in word error rate of more than 50% relative, compared to modeling without deep learning. In the wake of this transition, a number of all-neural ASR architectures have been introduced. These so-called end-to-end (E2E) models provide highly integrated, completely neural ASR models, which rely strongly on general machine learning knowledge, learn more consistently from data, with lower dependence on ASR domain-specific experience. The success and enthusiastic adoption of deep learning, accompanied by more generic model architectures has led to E2E models now becoming the…
Citation impact
- FWCI
- 29.80
- Percentile
- 100%
- References
- 360
Authors
5Topics & keywords
- Computer science
- Hidden Markov model
- Deep learning
- Artificial neural network
- Language model
- Software deployment
- Artificial intelligence
- End-to-end principle
- Quality Education