Cross-Modal Self-Attention Network for Referring Image Segmentation

Ye, Linwei; Rochan, Mrigank; Liu, Zhi; Wang, Yang

doi:10.1109/cvpr.2019.01075

articleJun 1, 2019Closed access

Cross-Modal Self-Attention Network for Referring Image Segmentation

LYLinwei Ye MRMrigank Rochan ZLZhi Liu YWYang Wang

University of Manitoba · Shanghai University

Indexed incrossref

Abstract

We consider the problem of referring image segmentation. Given an input image and a natural language expression, the goal is to segment the object referred by the language expression in the image. Existing works in this area treat the language expression and the input image separately in their representations. They do not sufficiently capture long-range correlations between these two modalities. In this paper, we propose a cross-modal self-attention (CMSA) module that effectively captures the long-range dependencies between linguistic and visual features. Our model can adaptively focus on informative words in the referring expression and important regions in the input image. In addition, we propose a gated…

Citation impact

508

total citations

FWCI: 19.80
Percentile: 100%
References: 39

Citations per year

Authors

4

Topics & keywords

Topics

Keywords

Computer science
Expression (computer science)
Image (mathematics)
Artificial intelligence
Focus (optics)
Modal
Image segmentation
Segmentation

UN Sustainable Development Goals

Quality Education

No related works found for this paper.