Qwen2.5-VL Technical Report

Bai, Shuai; Chen, Keqin; Liu, Xuejing; Wang, Jialin; Ge, Wenbin; Song, Sibo; Dang, Kai; Wang, Peng; Wang, Shijie; Tang, Jun; Humen, Zhong,; Zhu, Yuanzhi; Yang, Mingkun; Li, Zhaohai; Wan, Jianqiang; Wang, Pengfei; Ding, Wei; Fu, Zhangping; Xu, Yuanzhong; Ye, Jiabo; Zhang, X.-C.; Xie, Tianbao; Zesen, Cheng,; Hang, Zhang; Yang, Zhibo; Xu, Haiyang; Junyang, Lin,

doi:10.48550/arxiv.2502.13923

preprintArXiv.orgFeb 19, 2025GREEN OA

Qwen2.5-VL Technical Report

SBShuai Bai KCKeqin Chen XLXuejing Liu JWJialin Wang WGWenbin Ge

Indexed inarxivdatacite

Abstract

We introduce Qwen2.5-VL, the latest flagship model of Qwen vision-language series, which demonstrates significant advancements in both foundational capabilities and innovative functionalities. Qwen2.5-VL achieves a major leap forward in understanding and interacting with the world through enhanced visual recognition, precise object localization, robust document parsing, and long-video comprehension. A standout feature of Qwen2.5-VL is its ability to localize objects using bounding boxes or points accurately. It provides robust structured data extraction from invoices, forms, and tables, as well as detailed analysis of charts, diagrams, and layouts. To handle complex inputs, Qwen2.5-VL introduces dynamic…

Citation impact

52

total citations

FWCI: —
Percentile: —
References: 0

Citations per year

Authors

27

Topics & keywords

Topics

Semiconductor Lasers and Optical Devices32%

Keywords

Business

No related works found for this paper.