Attend-and-Excite: Attention-Based Semantic Guidance for Text-to-Image Diffusion Models
Indexed incrossref
Abstract
Recent text-to-image generative models have demonstrated an unparalleled ability to generate diverse and creative imagery guided by a target text prompt. While revolutionary, current state-of-the-art diffusion models may still fail in generating images that fully convey the semantics in the given text prompt. We analyze the publicly available Stable Diffusion model and assess the existence of catastrophic neglect , where the model fails to generate one or more of the subjects from the input prompt. Moreover, we find that in some cases the model also fails to correctly bind attributes ( e.g. , colors) to their corresponding subjects. To help mitigate these failure cases, we introduce the concept of Generative…
Citation impact
362
total citations
- FWCI
- 41.12
- Percentile
- 100%
- References
- 20
Citations per year
Authors
5Topics & keywords
Topics
Keywords
- Computer science
- Generative grammar
- Generative model
- Inference
- Process (computing)
- Image (mathematics)
- Semantics (computer science)
- Artificial intelligence
No related works found for this paper.