articleOct 1, 2023Closed access

FateZero: Fusing Attentions for Zero-shot Text-based Video Editing

Hong Kong University of Science and Technology · Tencent (China) · +1 more institution

Indexed incrossref

Abstract

The diffusion-based generative models have achieved remarkable success in text-based image generation. However, since it contains enormous randomness in generation progress, it is still challenging to apply such models for real-world visual content editing, especially in videos. In this paper, we propose FateZero, a zero-shot text-based editing method on real-world videos without per-prompt training or use-specific mask. To edit videos consistently, we propose several techniques based on the pre-trained models. Firstly, in contrast to the straightforward DDIM inversion technique, our approach captures intermediate attention maps during inversion, which effectively retain both structural and motion information.…

Citation impact

181
total citations
FWCI
20.56
Percentile
100%
References
52
Citations per year

Authors

7

Topics & keywords

Keywords
  • Computer science
  • Image editing
  • Video editing
  • Randomness
  • Artificial intelligence
  • Shot (pellet)
  • Computer vision
  • Consistency (knowledge bases)
No related works found for this paper.