TransXNet: Learning Both Global and Local Dynamics With a Dual Dynamic Token Mixer for Visual Recognition

Lou, Meng; Zhang, Shu; Zhou, Hong-Yu; Yang, Sibei; Wu, Chuan; Yu, Yizhou

doi:10.1109/tnnls.2025.3550979

articleIEEE Transactions on Neural Networks and Learning SystemsApr 3, 2025Closed access

TransXNet: Learning Both Global and Local Dynamics With a Dual Dynamic Token Mixer for Visual Recognition

MLMeng Lou SZShu Zhang HZHong-Yu Zhou SYSibei Yang CWChuan Wu

Chinese University of Hong Kong · University of Hong Kong · +2 more institutions

PubMed

Indexed incrossrefpubmed

Abstract

Recent studies have integrated convolutions into transformers to introduce inductive bias and improve generalization performance. However, the static nature of conventional convolution prevents it from dynamically adapting to input variations, resulting in a representation discrepancy between convolution and self-attention as self-attention calculates attention matrices dynamically. Furthermore, when stacking token mixers that consist of convolution and self-attention to form a deep network, the static nature of convolution hinders the fusion of features previously generated by self-attention into convolution kernels. These two limitations result in a suboptimal representation capacity of the constructed…

Citation impact

59

total citations

FWCI: 126.05
Percentile: 100%
References: 59

Citations per year

Authors

6

Topics & keywords

Topics

Keywords

Dynamics (music)
Dual (grammatical number)
Computer science
Security token
Artificial intelligence
Speech recognition
Computer network
Psychology

No related works found for this paper.