Attend-and-Excite: Attention-Based Semantic Guidance for Text-to-Image Diffusion Models

Chefer, Hila; Alaluf, Yuval; Vinker, Yael; Wolf, Lior; Cohen‐Or, Daniel

doi:10.1145/3592116

articleACM Transactions on GraphicsJul 26, 2023Closed access

Attend-and-Excite: Attention-Based Semantic Guidance for Text-to-Image Diffusion Models

HCHila Chefer YAYuval Alaluf YVYael Vinker LWLior Wolf DCDaniel Cohen‐Or

Tel Aviv University

Indexed incrossref

Abstract

Recent text-to-image generative models have demonstrated an unparalleled ability to generate diverse and creative imagery guided by a target text prompt. While revolutionary, current state-of-the-art diffusion models may still fail in generating images that fully convey the semantics in the given text prompt. We analyze the publicly available Stable Diffusion model and assess the existence of catastrophic neglect , where the model fails to generate one or more of the subjects from the input prompt. Moreover, we find that in some cases the model also fails to correctly bind attributes ( e.g. , colors) to their corresponding subjects. To help mitigate these failure cases, we introduce the concept of Generative…

Citation impact

362

total citations

FWCI: 41.12
Percentile: 100%
References: 20

Citations per year

Authors

5

Topics & keywords

Topics

Keywords

Computer science
Generative grammar
Generative model
Inference
Process (computing)
Image (mathematics)
Semantics (computer science)
Artificial intelligence

No related works found for this paper.