Automatic speech emotion recognition using recurrent neural networks with local attention

Mirsamadi, Seyedmahdad; Barsoum, Emad; Zhang, Cha

doi:10.1109/icassp.2017.7952552

articleMar 1, 2017Closed access

Automatic speech emotion recognition using recurrent neural networks with local attention

SMSeyedmahdad Mirsamadi EBEmad Barsoum CZCha Zhang

The University of Texas at Dallas · Microsoft (United States)

Indexed incrossref

Abstract

Automatic emotion recognition from speech is a challenging task which relies heavily on the effectiveness of the speech features used for classification. In this work, we study the use of deep learning to automatically discover emotionally relevant features from speech. It is shown that using a deep recurrent neural network, we can learn both the short-time frame-level acoustic features that are emotionally relevant, as well as an appropriate temporal aggregation of those features into a compact utterance-level representation. Moreover, we propose a novel strategy for feature pooling over time which uses local attention in order to focus on specific regions of a speech signal that are more emotionally salient.…

Citation impact

731

total citations

FWCI: 63.34
Percentile: 100%
References: 16

Citations per year

Authors

3

Topics & keywords

Topics

Keywords

Computer science
Pooling
Speech recognition
Utterance
Salient
Focus (optics)
Artificial intelligence
Feature (linguistics)

No related works found for this paper.