AVLTrack: Dynamic Sparse Learning for Aerial Vision-Language Tracking
Indexed incrossref
Abstract
The introduction of natural language for vision-language (VL) tracking has been proven to improve performance. However, natural language remains under-explored in existing aerial trackers. Moreover, existing VL trackers ignore the misalignment of language with dynamic target states, which is prominent in complex UAV scenarios. In this work, we present AVLTrack, a flexible framework for aerial vision-language tracking. It consists of three key components, a dynamic sparse learning (DSL) module, an efficient Transformer backbone, and a multi-level language perception (MLP) strategy. First, DSL sparsely connects language and images via dynamic sparse attention, providing accurate multi-modal prompts. To adapt to…
Citation impact
57
total citations
- FWCI
- 57.86
- Percentile
- 100%
- References
- 83
Citations per year
Authors
7Topics & keywords
Topics
Keywords
- Computer vision
- Computer science
- Artificial intelligence
- Tracking (education)
No related works found for this paper.