Visual Prompt Multi-Modal Tracking

Zhu, Jiawen; Lai, Simiao; Chen, Xin; Wang, Dong; Lu, Huchuan

doi:10.1109/cvpr52729.2023.00918

articleJun 1, 2023Closed access

Visual Prompt Multi-Modal Tracking

JZJiawen Zhu SLSimiao Lai XCXin Chen DWDong Wang HLHuchuan Lu

Dalian University of Technology · Peng Cheng Laboratory

Indexed incrossref

Abstract

Visible-modal object tracking gives rise to a series of downstream multi-modal tracking tributaries. To inherit the powerful representations of the foundation model, a natural modus operandi for multi-modal tracking is full fine-tuning on the RGB-based parameters. Albeit effective, this manner is not optimal due to the scarcity of downstream data and poor transferability, etc. In this paper, inspired by the recent success of the prompt learning in language models, we develop Visual Prompt multi-modal Tracking (ViPT), which learns the modal-relevant prompts to adapt the frozen pre-trained foundation model to various downstream multi-modal tracking tasks. ViPT finds a better way to stimulate the knowledge of the…

Citation impact

296

total citations

FWCI: 33.62
Percentile: 100%
References: 91

Citations per year

Authors

5

Topics & keywords

Topics

Keywords

Computer science
Modal
RGB color model
Artificial intelligence
Tracking (education)
Downstream (manufacturing)
Computer vision
Eye tracking

UN Sustainable Development Goals

No poverty

No related works found for this paper.