emotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation

Ma, Ziyang; Zheng, Zhisheng; Ye, Jiaxin; Li, Jinchao; Gao, Zhifu; Zhang, Shiliang; Chen, Xie

doi:10.18653/v1/2024.findings-acl.931

articleJan 1, 2024GOLD OA

emotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation

ZMZiyang Ma ZZZhisheng Zheng JYJiaxin Ye JLJinchao Li ZGZhifu Gao

Indexed incrossref

Abstract

We propose emotion2vec, a universal speech emotion representation model.emotion2vec is pre-trained on open-source unlabeled emotion data through self-supervised online distillation, combining utterance-level loss and framelevel loss during pre-training.emotion2vec outperforms state-of-the-art pre-trained universal models and emotion specialist models by only training linear layers for the speech emotion recognition task on the mainstream IEMOCAP dataset.In addition, emotion2vec shows consistent improvements among 10 different languages of speech emotion recognition datasets.emotion2vec also shows excellent results on other emotion tasks, such as song emotion recognition, emotion prediction in conversation, and…

Citation impact

116

total citations

FWCI: 37.18
Percentile: 100%
References: 0

Citations per year

Authors

7

Topics & keywords

Topics

Keywords

Computer science
Training (meteorology)
Speech recognition
Representation (politics)
Artificial intelligence
Emotion recognition
Self representation
Natural language processing

No related works found for this paper.