Transformer-Based Visual Segmentation: A Survey

Li, Xiangtai; Ding, Henghui; Yuan, Haobo; Zhang, Wenwei; Pang, Jiangmiao; Cheng, Guangliang; Chen, Kai; Liu, Ziwei; Loy, Chen Change

doi:10.1109/tpami.2024.3434373

articleIEEE Transactions on Pattern Analysis and Machine IntelligenceJul 29, 2024Closed access

Transformer-Based Visual Segmentation: A Survey

XLXiangtai Li HDHenghui Ding HYHaobo Yuan WZWenwei Zhang JPJiangmiao Pang

Nanyang Technological University · Fudan University · +2 more institutions

PubMed

Indexed incrossrefpubmed

Abstract

Visual segmentation seeks to partition images, video frames, or point clouds into multiple segments or groups. This technique has numerous real-world applications, such as autonomous driving, image editing, robot sensing, and medical analysis. Over the past decade, deep learning-based methods have made remarkable strides in this area. Recently, transformers, a type of neural network based on self-attention originally designed for natural language processing, have considerably surpassed previous convolutional or recurrent approaches in various vision processing tasks. Specifically, vision transformers offer robust, unified, and even simpler solutions for various segmentation tasks. This survey provides a…

Citation impact

189

total citations

FWCI: 38.76
Percentile: 100%
References: 398

Citations per year

Authors

9

Topics & keywords

Topics

Keywords

Computer science
Segmentation
Artificial intelligence
Convolutional neural network
Deep learning
Point cloud
Transformer
Architecture

No related works found for this paper.

Funding

AT
Alan Turing Institute
Award: SDCfP2\100009