articleJun 1, 2023Closed access

Visual Prompt Multi-Modal Tracking

Dalian University of Technology · Peng Cheng Laboratory

Indexed incrossref

Abstract

Visible-modal object tracking gives rise to a series of downstream multi-modal tracking tributaries. To inherit the powerful representations of the foundation model, a natural modus operandi for multi-modal tracking is full fine-tuning on the RGB-based parameters. Albeit effective, this manner is not optimal due to the scarcity of downstream data and poor transferability, etc. In this paper, inspired by the recent success of the prompt learning in language models, we develop Visual Prompt multi-modal Tracking (ViPT), which learns the modal-relevant prompts to adapt the frozen pre-trained foundation model to various downstream multi-modal tracking tasks. ViPT finds a better way to stimulate the knowledge of the…

Citation impact

296
total citations
FWCI
33.62
Percentile
100%
References
91
Citations per year

Authors

5

Topics & keywords

Keywords
  • Computer science
  • Modal
  • RGB color model
  • Artificial intelligence
  • Tracking (education)
  • Downstream (manufacturing)
  • Computer vision
  • Eye tracking
UN Sustainable Development Goals
  • No poverty
No related works found for this paper.

Funding