Make-A-Video: Text-to-Video Generation without Text-Video Data

Singer, Uriel; Polyak, Adam; Hayes, Thomas; Yin, Xi; An, Jie; Zhang, Songyang; Hu, Qiyuan; Yang, Harry; Ashual, Oron; Gafni, Oran; Parikh, Devi; Gupta, Sonal; Taigman, Yaniv

doi:10.48550/arxiv.2209.14792

preprintarXiv (Cornell University)Sep 29, 2022GREEN OA

Make-A-Video: Text-to-Video Generation without Text-Video Data

USUriel Singer APAdam Polyak THThomas Hayes XYXi Yin JAJie An

Indexed inarxivdatacite

Abstract

We propose Make-A-Video -- an approach for directly translating the tremendous recent progress in Text-to-Image (T2I) generation to Text-to-Video (T2V). Our intuition is simple: learn what the world looks like and how it is described from paired text-image data, and learn how the world moves from unsupervised video footage. Make-A-Video has three advantages: (1) it accelerates training of the T2V model (it does not need to learn visual and multimodal representations from scratch), (2) it does not require paired text-video data, and (3) the generated videos inherit the vastness (diversity in aesthetic, fantastical depictions, etc.) of today's image generation models. We design a simple yet effective way to…

Citation impact

313

total citations

FWCI: —
Percentile: —
References: 0

Citations per year

Authors

13

Topics & keywords

Topics

Keywords

Computer science
Video post-processing
Pipeline (software)
Artificial intelligence
Video processing
Interpolation (computer graphics)
Computer vision
Video compression picture types

No related works found for this paper.