Imagen Video: High Definition Video Generation with Diffusion Models

Ho, Jonathan; Chan, William; Saharia, Chitwan; Whang, Jay; Gao, Ruiqi; Gritsenko, Alexey A.; Kingma, Diederik P.; Poole, Ben; Norouzi, Mohammad; Fleet, David J.; Salimans, Tim

doi:10.48550/arxiv.2210.02303

preprintarXiv (Cornell University)Oct 5, 2022GREEN OA

Imagen Video: High Definition Video Generation with Diffusion Models

JHJonathan Ho WCWilliam Chan CSChitwan Saharia JWJay Whang RGRuiqi Gao

Indexed inarxivdatacite

Abstract

We present Imagen Video, a text-conditional video generation system based on a cascade of video diffusion models. Given a text prompt, Imagen Video generates high definition videos using a base video generation model and a sequence of interleaved spatial and temporal video super-resolution models. We describe how we scale up the system as a high definition text-to-video model including design decisions such as the choice of fully-convolutional temporal and spatial super-resolution models at certain resolutions, and the choice of the v-parameterization of diffusion models. In addition, we confirm and transfer findings from previous work on diffusion-based image generation to the video generation setting.…

Citation impact

346

total citations

FWCI: —
Percentile: —
References: 0

Citations per year

Authors

11

Topics & keywords

Topics

Keywords

Computer science
Video tracking
Video compression picture types
Video processing
Artificial intelligence
Computer vision
Video quality

No related works found for this paper.