Proceedings of the 32nd ACM International Conference on Multimedia

Pan, Qihe; Zhao, Zhen; Wang, Zicheng; Long, Sifan; Wu, Yiming; Ji, Wei; Liang, Haoran; Liang, Ronghua

doi:10.1145/3664647

preprintOct 26, 2024GREEN OA

Proceedings of the 32nd ACM International Conference on Multimedia

QPQihe Pan ZZZhen Zhao ZWZicheng Wang SLSifan Long YWYiming Wu

Indexed inarxivcrossref

Abstract

A plethora of text-guided image editing methods has recently been developed by leveraging the impressive capabilities of large-scale diffusion-based generative models especially Stable Diffusion. Despite the success of diffusion models in producing high-quality images, their application to small object generation has been limited due to difficulties in aligning cross-modal attention maps between text and these objects. Our approach offers a training-free method that significantly mitigates this alignment issue with local and global attention guidance , enhancing the model's ability to accurately render small objects in accordance with textual descriptions. We detail the methodology in our approach, emphasizing…

Citation impact

287

total citations

FWCI: —
Percentile: —
References: 40

Citations per year

Authors

8

Topics & keywords

Topics

Keywords

Computer science
Multimedia
Library science
World Wide Web

No related works found for this paper.