T2I-Adapter: Learning Adapters to Dig Out More Controllable Ability for Text-to-Image Diffusion Models

Mou, Chong; Wang, Xintao; Xie, Liangbin; Wu, Yanze; Zhang, Jian; Qi, Zhongang; Shan, Ying

doi:10.1609/aaai.v38i5.28226

articleProceedings of the AAAI Conference on Artificial IntelligenceMar 24, 2024DIAMOND OA

T2I-Adapter: Learning Adapters to Dig Out More Controllable Ability for Text-to-Image Diffusion Models

CMChong Mou XWXintao Wang LXLiangbin Xie YWYanze Wu JZJian Zhang

Peking University · Tencent (China) · +3 more institutions

Indexed incrossref

Abstract

The incredible generative ability of large-scale text-to-image (T2I) models has demonstrated strong power of learning complex structures and meaningful semantics. However, relying solely on text prompts cannot fully take advantage of the knowledge learned by the model, especially when flexible and accurate controlling (e.g., structure and color) is needed. In this paper, we aim to ``dig out" the capabilities that T2I models have implicitly learned, and then explicitly use them to control the generation more granularly. Specifically, we propose to learn low-cost T2I-Adapters to align internal knowledge in T2I models with external control signals, while freezing the original large T2I models. In this way, we can…

Citation impact

702

total citations

FWCI: 91.59
Percentile: 100%
References: 61

Citations per year

Authors

7

Topics & keywords

Topics

Keywords

Adapter (computing)
Dig
Computer science
Image (mathematics)
Artificial intelligence
Computer graphics (images)
Computer hardware
World Wide Web

UN Sustainable Development Goals

Quality Education

No related works found for this paper.

Funding

NN
National Natural Science Foundation of China
Award: 62372016