articleIEEE Robotics and Automation LettersFeb 24, 2025Closed access

TinyVLA: Toward Fast, Data-Efficient Vision-Language-Action Models for Robotic Manipulation

JWJunjie WenYZYichen ZhuJLJinming LiMZMinjie ZhuZTZhibin Tang

Midea Group (China) · East China Normal University · +3 more institutions

Indexed incrossref

Abstract

Vision-Language-Action (VLA) models have shown remarkable potential in visuomotor control and instruction comprehension through end-to-end learning processes. However, current VLA models face significant challenges: they are slow during inference and require extensive pre-training on large amounts of robotic data, making real-world deployment difficult. In this letter, we introduce a new family of compact vision-language-action models, called TinyVLA, which offers two key advantages over existing VLA models: (1) faster inference speeds, and (2) improved data efficiency, eliminating the need for pre-training stage. Our framework incorporates two essential components to build TinyVLA: (1) initializing the policy…

Citation impact

47
total citations
FWCI
46.35
Percentile
100%
References
44
Citations per year

Authors

13

Topics & keywords

Keywords
  • Action (physics)
  • Computer science
  • Artificial intelligence
  • Human–computer interaction
  • Computer vision
No related works found for this paper.

Funding