TransNeXt: Robust Foveal Visual Perception for Vision Transformers

Shi, Dai

doi:10.1109/cvpr52733.2024.01683

articleJun 16, 2024Closed access

TransNeXt: Robust Foveal Visual Perception for Vision Transformers

DSDai Shi

Indexed incrossref

Abstract

Due to the depth degradation effect in residual connections, many efficient Vision Transformers models that rely on stacking layers for information exchange often fail to form sufficient information mixing, leading to unnatural visual perception. To address this issue, in this paper, we propose Aggregated Attention, a biomimetic design-based token mixer that simulates biological foveal vision and continuous eye movement while enabling each token on the feature map to have a global perception. Furthermore, we incorporate learnable tokens that interact with conventional queries and keys, which further diversifies the generation of affinity matrices beyond merely relying on the similarity between queries and…

Citation impact

319

total citations

FWCI: 71.37
Percentile: 100%
References: 65

Citations per year

Authors

1

DS
Dai ShiCorresponding

Topics & keywords

Topics

Keywords

Foveal
Computer science
Computer vision
Perception
Artificial intelligence
Visual perception
Psychology
Neuroscience

No related works found for this paper.