Learning Salient Features for Speech Emotion &lt;newline/&gt;Recognition Using Convolutional &lt;newline/&gt;Neural Networks

Mao, Qirong; Dong, Ming; Huang, Zheng-Wei; Zhan, Yongzhao

doi:10.1109/tmm.2014.2360798

articleIEEE Transactions on MultimediaSep 29, 2014Closed access

Learning Salient Features for Speech Emotion <newline/>Recognition Using Convolutional <newline/>Neural Networks

QMQirong Mao MDMing Dong ZHZheng-Wei Huang YZYongzhao Zhan

Jiangsu University · Wayne State University

Indexed incrossref

Abstract

As an essential way of human emotional behavior understanding, speech emotion recognition (SER) has attracted a great deal of attention in human-centered signal processing. Accuracy in SER heavily depends on finding good affect- related , discriminative features. In this paper, we propose to learn affect-salient features for SER using convolutional neural networks (CNN). The training of CNN involves two stages. In the first stage, unlabeled samples are used to learn local invariant features (LIF) using a variant of sparse auto-encoder (SAE) with reconstruction penalization. In the second step, LIF is used as the input to a feature extractor, salient discriminative feature analysis (SDFA), to learn…

Citation impact

557

total citations

FWCI: 23.99
Percentile: 100%
References: 56

Citations per year

Authors

4

Topics & keywords

Topics

Keywords

Discriminative model
Computer science
Convolutional neural network
Artificial intelligence
Salient
Pattern recognition (psychology)
Speech recognition
Feature (linguistics)

UN Sustainable Development Goals

Reduced inequalities

No related works found for this paper.

Funding

NN
National Natural Science Foundation of China
Awards: 61170126, 61272211

Learning Salient Features for Speech Emotion &lt;newline/&gt;Recognition Using Convolutional &lt;newline/&gt;Neural Networks

Learning Salient Features for Speech Emotion <newline/>Recognition Using Convolutional <newline/>Neural Networks