Efficient Multimodal Transformer With Dual-Level Feature Restoration for Robust Multimodal Sentiment Analysis
Chinese Academy of Sciences · Beijing Academy of Artificial Intelligence · +3 more institutions
Abstract
With the proliferation of user-generated online videos, Multimodal Sentiment Analysis (MSA) has attracted increasing attention recently. Despite significant progress, there are still two major challenges on the way towards robust MSA: 1) inefficiency when modeling cross-modal interactions in unaligned multimodal data; and 2) vulnerability to random modality feature missing which typically occurs in realistic settings. In this paper, we propose a generic and unified framework to address them, named Efficient Multimodal Transformer with Dual-Level Feature Restoration (EMT-DLFR). Concretely, EMT employs utterance-level representations from each modality as the global multimodal context to interact with local…
Citation impact
- FWCI
- 35.10
- Percentile
- 100%
- References
- 71
Authors
4- LSLicai SunCorresponding
Chinese Academy of Sciences, Beijing Academy of Artificial Intelligence, Institute of Automation, University of Chinese Academy of Sciences
- ZLZheng Lian
Chinese Academy of Sciences, Institute of Automation
- BLBin Liu
Chinese Academy of Sciences, Institute of Automation
- JTJianhua Tao
Tsinghua University
Topics & keywords
- Computer science
- Robustness (evolution)
- Artificial intelligence
- Feature (linguistics)
- Feature learning
- Machine learning
- Sentiment analysis
- Pattern recognition (psychology)