Rep ViT: Revisiting Mobile CNN From ViT Perspective

Wang, Ao; Chen, Hui; Lin, Zijia; Han, Jungong; Ding, Guiguang

doi:10.1109/cvpr52733.2024.01506

articleJun 16, 2024Closed access

Rep ViT: Revisiting Mobile CNN From ViT Perspective

AWAo Wang HCHui Chen ZLZijia Lin JHJungong Han GDGuiguang Ding

Tsinghua University · University of Sheffield

Indexed incrossref

Abstract

Recently, lightweight Vision Transformers (ViTs) demon-strate superior performance and lower latency, compared with lightweight Convolutional Neural Networks (CNNs), on resource-constrained mobile devices. Researchers have discovered many structural connections be-tween lightweight ViTs and lightweight CNNs. However, the notable architectural disparities in the block structure, macro, and micro designs between them have not been adequately examined. In this study, we revisit the efficient design of lightweight CNNs from ViT perspective and emphasize their promising prospect for mobile devices. Specifically, we incrementally enhance the mobile-friendliness of a standard lightweight CNN, i.e., MobileNetV3, by…

Citation impact

510

total citations

FWCI: 113.98
Percentile: 100%
References: 91

Citations per year

Authors

5

Topics & keywords

Topics

Keywords

Perspective (graphical)
Computer science
Artificial intelligence

No related works found for this paper.