Cross-Modal Causal Relational Reasoning for Event-Level Visual Question Answering

Sun Yat-sen University

PubMed
Indexed incrossrefpubmed

Abstract

Existing visual question answering methods often suffer from cross-modal spurious correlations and oversimplified event-level reasoning processes that fail to capture event temporality, causality, and dynamics spanning over the video. In this work, to address the task of event-level visual question answering, we propose a framework for cross-modal causal relational reasoning. In particular, a set of causal intervention operations is introduced to discover the underlying causal structures across visual and linguistic modalities. Our framework, named Cross-Modal Causal RelatIonal Reasoning (CMCIR), involves three modules: i) Causality-aware Visual-Linguistic Reasoning (CVLR) module for collaboratively…

Citation impact

330
total citations
FWCI
16.43
Percentile
100%
References
110
Citations per year

Authors

3

Topics & keywords

Keywords
  • Computer science
  • Causal reasoning
  • Question answering
  • Artificial intelligence
  • Natural language processing
  • Event (particle physics)
  • Spurious relationship
  • Visual reasoning
No related works found for this paper.

Funding