Imagen Video: High Definition Video Generation with Diffusion Models
Indexed inarxivdatacite
Abstract
We present Imagen Video, a text-conditional video generation system based on a cascade of video diffusion models. Given a text prompt, Imagen Video generates high definition videos using a base video generation model and a sequence of interleaved spatial and temporal video super-resolution models. We describe how we scale up the system as a high definition text-to-video model including design decisions such as the choice of fully-convolutional temporal and spatial super-resolution models at certain resolutions, and the choice of the v-parameterization of diffusion models. In addition, we confirm and transfer findings from previous work on diffusion-based image generation to the video generation setting.…
Citation impact
346
total citations
- FWCI
- —
- Percentile
- —
- References
- 0
Citations per year
Authors
11Topics & keywords
Topics
Keywords
- Computer science
- Video tracking
- Video compression picture types
- Video processing
- Artificial intelligence
- Computer vision
- Video quality
No related works found for this paper.