preprintarXiv (Cornell University)Sep 29, 2022GREEN OA

Make-A-Video: Text-to-Video Generation without Text-Video Data

Indexed inarxivdatacite

Abstract

We propose Make-A-Video -- an approach for directly translating the tremendous recent progress in Text-to-Image (T2I) generation to Text-to-Video (T2V). Our intuition is simple: learn what the world looks like and how it is described from paired text-image data, and learn how the world moves from unsupervised video footage. Make-A-Video has three advantages: (1) it accelerates training of the T2V model (it does not need to learn visual and multimodal representations from scratch), (2) it does not require paired text-video data, and (3) the generated videos inherit the vastness (diversity in aesthetic, fantastical depictions, etc.) of today's image generation models. We design a simple yet effective way to…

Citation impact

313
total citations
FWCI
Percentile
References
0
Citations per year

Authors

13

Topics & keywords

Keywords
  • Computer science
  • Video post-processing
  • Pipeline (software)
  • Artificial intelligence
  • Video processing
  • Interpolation (computer graphics)
  • Computer vision
  • Video compression picture types
No related works found for this paper.