Tune-A-Video: One-Shot Tuning of Image Diffusion Models for Text-to-Video Generation

Wu, Jay Zhangjie; Ge, Yixiao; Wang, Xintao; Lei, Stan Weixian; Gu, Yuchao; Shi, Yufei; Hsu, Wynne; Shan, Ying; Qie, Xiaohu; Shou, Mike Zheng

doi:10.1109/iccv51070.2023.00701

articleOct 1, 2023Closed access

Tune-A-Video: One-Shot Tuning of Image Diffusion Models for Text-to-Video Generation

JZJay Zhangjie Wu YGYixiao Ge XWXintao Wang SWStan Weixian Lei YGYuchao Gu

National University of Singapore · Tencent (China)

Indexed incrossref

Abstract

To replicate the success of text-to-image (T2I) generation, recent works employ large-scale video datasets to train a text-to-video (T2V) generator. Despite their promising results, such paradigm is computationally expensive. In this work, we propose a new T2V generation setting—One-Shot Video Tuning, where only one text-video pair is presented. Our model is built on state-of-the-art T2I diffusion models pre-trained on massive image data. We make two key observations: 1) T2I models can generate still images that represent verb terms; 2) extending T2I models to generate multiple images concurrently exhibits surprisingly good content consistency. To further learn continuous motion, we introduce Tune-A-Video,…

Citation impact

473

total citations

FWCI: 53.72
Percentile: 100%
References: 83

Citations per year

Authors

10

Topics & keywords

Topics

Keywords

Computer science
Artificial intelligence
Inference
Computer vision
Video tracking
Key (lock)
Shot (pellet)
Generator (circuit theory)

No related works found for this paper.

Funding

NR
National Research Foundation