FastComposer: Tuning-Free Multi-subject Image Generation with Localized Attention
Massachusetts Institute of Technology · Nvidia (United States)
Abstract
Abstract Diffusion models excel at text-to-image generation, especially in subject-driven generation for personalized images. However, existing methods are inefficient due to the subject-specific fine-tuning, which is computationally intensive and hampers efficient deployment. Moreover, existing methods struggle with multi-subject generation as they often blend identity among subjects. We present FastComposer which enables efficient, personalized, multi-subject text-to-image generation without fine-tuning. FastComposer uses subject embeddings extracted by an image encoder to augment the generic text conditioning in diffusion models, enabling personalized image generation based on subject images and textual…
Citation impact
- FWCI
- 24.68
- Percentile
- 100%
- References
- 52
Authors
5Topics & keywords
- Artificial intelligence
- Pattern recognition (psychology)
- Computer science
- Computer vision
- Image (mathematics)
- Subject (documents)
- Image processing