Make-A-Video: Text-to-Video Generation without Text-Video Data
Indexed inarxivdatacite
Abstract
We propose Make-A-Video -- an approach for directly translating the tremendous recent progress in Text-to-Image (T2I) generation to Text-to-Video (T2V). Our intuition is simple: learn what the world looks like and how it is described from paired text-image data, and learn how the world moves from unsupervised video footage. Make-A-Video has three advantages: (1) it accelerates training of the T2V model (it does not need to learn visual and multimodal representations from scratch), (2) it does not require paired text-video data, and (3) the generated videos inherit the vastness (diversity in aesthetic, fantastical depictions, etc.) of today's image generation models. We design a simple yet effective way to…
Citation impact
313
total citations
- FWCI
- —
- Percentile
- —
- References
- 0
Citations per year
Authors
13Topics & keywords
Topics
Keywords
- Computer science
- Video post-processing
- Pipeline (software)
- Artificial intelligence
- Video processing
- Interpolation (computer graphics)
- Computer vision
- Video compression picture types
No related works found for this paper.