AVLTrack: Dynamic Sparse Learning for Aerial Vision-Language Tracking

Guangxi Normal University

Indexed incrossref

Abstract

The introduction of natural language for vision-language (VL) tracking has been proven to improve performance. However, natural language remains under-explored in existing aerial trackers. Moreover, existing VL trackers ignore the misalignment of language with dynamic target states, which is prominent in complex UAV scenarios. In this work, we present AVLTrack, a flexible framework for aerial vision-language tracking. It consists of three key components, a dynamic sparse learning (DSL) module, an efficient Transformer backbone, and a multi-level language perception (MLP) strategy. First, DSL sparsely connects language and images via dynamic sparse attention, providing accurate multi-modal prompts. To adapt to…

Citation impact

57
total citations
FWCI
57.86
Percentile
100%
References
83
Citations per year

Authors

7

Topics & keywords

Keywords
  • Computer vision
  • Computer science
  • Artificial intelligence
  • Tracking (education)
No related works found for this paper.

Funding