FateZero: Fusing Attentions for Zero-shot Text-based Video Editing
Hong Kong University of Science and Technology · Tencent (China) · +1 more institution
Abstract
The diffusion-based generative models have achieved remarkable success in text-based image generation. However, since it contains enormous randomness in generation progress, it is still challenging to apply such models for real-world visual content editing, especially in videos. In this paper, we propose FateZero, a zero-shot text-based editing method on real-world videos without per-prompt training or use-specific mask. To edit videos consistently, we propose several techniques based on the pre-trained models. Firstly, in contrast to the straightforward DDIM inversion technique, our approach captures intermediate attention maps during inversion, which effectively retain both structural and motion information.…
Citation impact
- FWCI
- 20.56
- Percentile
- 100%
- References
- 52
Authors
7Topics & keywords
- Computer science
- Image editing
- Video editing
- Randomness
- Artificial intelligence
- Shot (pellet)
- Computer vision
- Consistency (knowledge bases)