Learning Salient Features for Speech Emotion <newline/>Recognition Using Convolutional <newline/>Neural Networks
Jiangsu University · Wayne State University
Abstract
As an essential way of human emotional behavior understanding, speech emotion recognition (SER) has attracted a great deal of attention in human-centered signal processing. Accuracy in SER heavily depends on finding good affect- related , discriminative features. In this paper, we propose to learn affect-salient features for SER using convolutional neural networks (CNN). The training of CNN involves two stages. In the first stage, unlabeled samples are used to learn local invariant features (LIF) using a variant of sparse auto-encoder (SAE) with reconstruction penalization. In the second step, LIF is used as the input to a feature extractor, salient discriminative feature analysis (SDFA), to learn…
Citation impact
- FWCI
- 23.99
- Percentile
- 100%
- References
- 56
Authors
4Topics & keywords
- Discriminative model
- Computer science
- Convolutional neural network
- Artificial intelligence
- Salient
- Pattern recognition (psychology)
- Speech recognition
- Feature (linguistics)
- Reduced inequalities