FateZero: Fusing Attentions for Zero-shot Text-based Video Editing

Qi, Chenyang; Cun, Xiaodong; Zhang, Yong; Lei, Chenyang; Wang, Xintao; Shan, Ying; Chen, Qifeng

doi:10.1109/iccv51070.2023.01460

articleOct 1, 2023Closed access

FateZero: Fusing Attentions for Zero-shot Text-based Video Editing

CQChenyang Qi XCXiaodong Cun YZYong Zhang CLChenyang Lei XWXintao Wang

Hong Kong University of Science and Technology · Tencent (China) · +1 more institution

Indexed incrossref

Abstract

The diffusion-based generative models have achieved remarkable success in text-based image generation. However, since it contains enormous randomness in generation progress, it is still challenging to apply such models for real-world visual content editing, especially in videos. In this paper, we propose FateZero, a zero-shot text-based editing method on real-world videos without per-prompt training or use-specific mask. To edit videos consistently, we propose several techniques based on the pre-trained models. Firstly, in contrast to the straightforward DDIM inversion technique, our approach captures intermediate attention maps during inversion, which effectively retain both structural and motion information.…

Citation impact

181

total citations

FWCI: 20.56
Percentile: 100%
References: 52

Citations per year

Authors

7

Topics & keywords

Topics

Keywords

Computer science
Image editing
Video editing
Randomness
Artificial intelligence
Shot (pellet)
Computer vision
Consistency (knowledge bases)

No related works found for this paper.