Audio-visual speech recognition using deep learning
Waseda University · Kyoto University · +1 more institution
Abstract
Audio-visual speech recognition (AVSR) system is thought to be one of the most promising solutions for reliable speech recognition, particularly when the audio is corrupted by noise. However, cautious selection of sensory features is crucial for attaining high recognition performance. In the machine-learning community, deep learning approaches have recently attracted increasing attention because deep neural networks can effectively extract robust latent features that enable various recognition algorithms to demonstrate revolutionary generalization capabilities under diverse application conditions. This study introduces a connectionist-hidden Markov model (HMM) system for noise-robust AVSR. First, a deep…
Citation impact
- FWCI
- 18.27
- Percentile
- 100%
- References
- 56
Authors
5Topics & keywords
- Computer science
- Speech recognition
- Artificial intelligence
- Hidden Markov model
- Convolutional neural network
- Pattern recognition (psychology)
- Deep learning
- Noise (video)
- Peace, Justice and strong institutions