Structure and Content-Guided Video Synthesis with Diffusion Models

Esser, Patrick; Chiu, Johnathan; Atighehchian, Parmida; Granskog, Jonathan; Germanidis, Anastasis

doi:10.1109/iccv51070.2023.00675

articleOct 1, 2023Closed access

Structure and Content-Guided Video Synthesis with Diffusion Models

PEPatrick Esser JCJohnathan Chiu PAParmida Atighehchian JGJonathan Granskog AGAnastasis Germanidis

Indexed incrossref

Abstract

Text-guided generative diffusion models unlock powerful image creation and editing tools. Recent approaches that edit the content of footage while retaining structure require expensive re-training for every input or rely on error-prone propagation of image edits across frames.In this work, we present a structure and content-guided video diffusion model that edits videos based on descriptions of the desired output. Conflicts between user-provided content edits and structure representations occur due to insufficient disentanglement between the two aspects. As a solution, we show that training on monocular depth estimates with varying levels of detail provides control over structure and content fidelity. A novel…

Citation impact

342

total citations

FWCI: 38.84
Percentile: 100%
References: 55

Citations per year

Authors

5

Topics & keywords

Topics

Keywords

Computer science
Consistency (knowledge bases)
Personalization
Fidelity
Generative grammar
Generative model
Control (management)
Variety (cybernetics)

No related works found for this paper.