articleOct 1, 2023Closed access

Tune-A-Video: One-Shot Tuning of Image Diffusion Models for Text-to-Video Generation

National University of Singapore · Tencent (China)

Indexed incrossref

Abstract

To replicate the success of text-to-image (T2I) generation, recent works employ large-scale video datasets to train a text-to-video (T2V) generator. Despite their promising results, such paradigm is computationally expensive. In this work, we propose a new T2V generation setting—One-Shot Video Tuning, where only one text-video pair is presented. Our model is built on state-of-the-art T2I diffusion models pre-trained on massive image data. We make two key observations: 1) T2I models can generate still images that represent verb terms; 2) extending T2I models to generate multiple images concurrently exhibits surprisingly good content consistency. To further learn continuous motion, we introduce Tune-A-Video,…

Citation impact

473
total citations
FWCI
53.72
Percentile
100%
References
83
Citations per year

Authors

10

Topics & keywords

Keywords
  • Computer science
  • Artificial intelligence
  • Inference
  • Computer vision
  • Video tracking
  • Key (lock)
  • Shot (pellet)
  • Generator (circuit theory)
No related works found for this paper.

Funding