Cross-Modal Causal Relational Reasoning for Event-Level Visual Question Answering
Indexed incrossrefpubmed
Abstract
Existing visual question answering methods often suffer from cross-modal spurious correlations and oversimplified event-level reasoning processes that fail to capture event temporality, causality, and dynamics spanning over the video. In this work, to address the task of event-level visual question answering, we propose a framework for cross-modal causal relational reasoning. In particular, a set of causal intervention operations is introduced to discover the underlying causal structures across visual and linguistic modalities. Our framework, named Cross-Modal Causal RelatIonal Reasoning (CMCIR), involves three modules: i) Causality-aware Visual-Linguistic Reasoning (CVLR) module for collaboratively…
Citation impact
330
total citations
- FWCI
- 16.43
- Percentile
- 100%
- References
- 110
Citations per year
Authors
3Topics & keywords
Topics
Keywords
- Computer science
- Causal reasoning
- Question answering
- Artificial intelligence
- Natural language processing
- Event (particle physics)
- Spurious relationship
- Visual reasoning
No related works found for this paper.