Deep convolutional neural networks for LVCSR

Sainath, Tara N.; Mohamed, Abdelrahman; Kingsbury, Brian; Ramabhadran, Bhuvana

doi:10.1109/icassp.2013.6639347

articleMay 1, 2013Closed access

Deep convolutional neural networks for LVCSR

TNTara N. Sainath AMAbdelrahman Mohamed BKBrian Kingsbury BRBhuvana Ramabhadran

IBM (United States) · IBM Research - Thomas J. Watson Research Center · +1 more institution

Indexed incrossref

Abstract

Convolutional Neural Networks (CNNs) are an alternative type of neural network that can be used to reduce spectral variations and model spectral correlations which exist in signals. Since speech signals exhibit both of these properties, CNNs are a more effective model for speech compared to Deep Neural Networks (DNNs). In this paper, we explore applying CNNs to large vocabulary speech tasks. First, we determine the appropriate architecture to make CNNs effective compared to DNNs for LVCSR tasks. Specifically, we focus on how many convolutional layers are needed, what is the optimal number of hidden units, what is the best pooling strategy, and the best input feature type for CNNs. We then explore the behavior…

Citation impact

1,069

total citations

FWCI: 69.23
Percentile: 100%
References: 19

Citations per year

Authors

4

Topics & keywords

Topics

Keywords

Computer science
Convolutional neural network
Pooling
Artificial intelligence
Speech recognition
Focus (optics)
Vocabulary
Feature (linguistics)

UN Sustainable Development Goals

Quality Education

No related works found for this paper.