articleJun 16, 2024Closed access

TransNeXt: Robust Foveal Visual Perception for Vision Transformers

DSDai Shi
Indexed incrossref

Abstract

Due to the depth degradation effect in residual connections, many efficient Vision Transformers models that rely on stacking layers for information exchange often fail to form sufficient information mixing, leading to unnatural visual perception. To address this issue, in this paper, we propose Aggregated Attention, a biomimetic design-based token mixer that simulates biological foveal vision and continuous eye movement while enabling each token on the feature map to have a global perception. Furthermore, we incorporate learnable tokens that interact with conventional queries and keys, which further diversifies the generation of affinity matrices beyond merely relying on the similarity between queries and…

Citation impact

319
total citations
FWCI
71.37
Percentile
100%
References
65
Citations per year

Authors

1
  • DS
    Dai ShiCorresponding

Topics & keywords

Keywords
  • Foveal
  • Computer science
  • Computer vision
  • Perception
  • Artificial intelligence
  • Visual perception
  • Psychology
  • Neuroscience
No related works found for this paper.