preprintOct 26, 2024GREEN OA
Proceedings of the 32nd ACM International Conference on Multimedia
Indexed inarxivcrossref
Abstract
A plethora of text-guided image editing methods has recently been developed by leveraging the impressive capabilities of large-scale diffusion-based generative models especially Stable Diffusion. Despite the success of diffusion models in producing high-quality images, their application to small object generation has been limited due to difficulties in aligning cross-modal attention maps between text and these objects. Our approach offers a training-free method that significantly mitigates this alignment issue with local and global attention guidance , enhancing the model's ability to accurately render small objects in accordance with textual descriptions. We detail the methodology in our approach, emphasizing…
Citation impact
287
total citations
- FWCI
- —
- Percentile
- —
- References
- 40
Citations per year
Authors
8Topics & keywords
Topics
Keywords
- Computer science
- Multimedia
- Library science
- World Wide Web
No related works found for this paper.