articleJun 16, 2024Closed access
TransNeXt: Robust Foveal Visual Perception for Vision Transformers
DSDai Shi
Indexed incrossref
Abstract
Due to the depth degradation effect in residual connections, many efficient Vision Transformers models that rely on stacking layers for information exchange often fail to form sufficient information mixing, leading to unnatural visual perception. To address this issue, in this paper, we propose Aggregated Attention, a biomimetic design-based token mixer that simulates biological foveal vision and continuous eye movement while enabling each token on the feature map to have a global perception. Furthermore, we incorporate learnable tokens that interact with conventional queries and keys, which further diversifies the generation of affinity matrices beyond merely relying on the similarity between queries and…
Citation impact
319
total citations
- FWCI
- 71.37
- Percentile
- 100%
- References
- 65
Citations per year
Authors
1- DSDai ShiCorresponding
Topics & keywords
Topics
Keywords
- Foveal
- Computer science
- Computer vision
- Perception
- Artificial intelligence
- Visual perception
- Psychology
- Neuroscience
No related works found for this paper.