Applying Convolutional Neural Networks concepts to hybrid NN-HMM model for speech recognition

Abdel‐Hamid, Ossama; Mohamed, Abdelrahman; Jiang, Hui; Penn, Gerald

doi:10.1109/icassp.2012.6288864

articleMar 1, 2012Closed access

Applying Convolutional Neural Networks concepts to hybrid NN-HMM model for speech recognition

OAOssama Abdel‐Hamid AMAbdelrahman Mohamed HJHui Jiang GPGerald Penn

York University · University of Toronto

Indexed incrossref

Abstract

Convolutional Neural Networks (CNN) have showed success in achieving translation invariance for many image processing tasks. The success is largely attributed to the use of local filtering and max-pooling in the CNN architecture. In this paper, we propose to apply CNN to speech recognition within the framework of hybrid NN-HMM model. We propose to use local filtering and max-pooling in frequency domain to normalize speaker variance to achieve higher multi-speaker speech recognition performance. In our method, a pair of local filtering layer and max-pooling layer is added at the lowest end of neural network (NN) to normalize spectral variations of speech signals. In our experiments, the proposed CNN…

Citation impact

893

total citations

FWCI: 74.28
Percentile: 100%
References: 15

Citations per year

Authors

4

Topics & keywords

Topics

Keywords

TIMIT
Computer science
Speech recognition
Convolutional neural network
Pooling
Hidden Markov model
Pattern recognition (psychology)
Artificial intelligence

No related works found for this paper.