articleJun 16, 2024Closed access

Rep ViT: Revisiting Mobile CNN From ViT Perspective

Tsinghua University · University of Sheffield

Indexed incrossref

Abstract

Recently, lightweight Vision Transformers (ViTs) demon-strate superior performance and lower latency, compared with lightweight Convolutional Neural Networks (CNNs), on resource-constrained mobile devices. Researchers have discovered many structural connections be-tween lightweight ViTs and lightweight CNNs. However, the notable architectural disparities in the block structure, macro, and micro designs between them have not been adequately examined. In this study, we revisit the efficient design of lightweight CNNs from ViT perspective and emphasize their promising prospect for mobile devices. Specifically, we incrementally enhance the mobile-friendliness of a standard lightweight CNN, i.e., MobileNetV3, by…

No related works found for this paper.

Funding