AVLTrack: Dynamic Sparse Learning for Aerial Vision-Language Tracking

Xue, Yuanliang; Zhong, Bineng; Jin, Guodong; Shen, Tao; Tan, Lining; Li, Ning; Zheng, Yaozong

doi:10.1109/tcsvt.2025.3549953

articleIEEE Transactions on Circuits and Systems for Video TechnologyMar 17, 2025Closed access

AVLTrack: Dynamic Sparse Learning for Aerial Vision-Language Tracking

YXYuanliang Xue BZBineng Zhong GJGuodong Jin TSTao Shen LTLining Tan

Guangxi Normal University

Indexed incrossref

Abstract

The introduction of natural language for vision-language (VL) tracking has been proven to improve performance. However, natural language remains under-explored in existing aerial trackers. Moreover, existing VL trackers ignore the misalignment of language with dynamic target states, which is prominent in complex UAV scenarios. In this work, we present AVLTrack, a flexible framework for aerial vision-language tracking. It consists of three key components, a dynamic sparse learning (DSL) module, an efficient Transformer backbone, and a multi-level language perception (MLP) strategy. First, DSL sparsely connects language and images via dynamic sparse attention, providing accurate multi-modal prompts. To adapt to…

Citation impact

57

total citations

FWCI: 57.86
Percentile: 100%
References: 83

Citations per year

Authors

7

Topics & keywords

Topics

Keywords

Computer vision
Computer science
Artificial intelligence
Tracking (education)

No related works found for this paper.