ELITE: Encoding Visual Concepts into Textual Embeddings for Customized Text-to-Image Generation

Wei, Yuxiang; Zhang, Yabo; Ji, Zhilong; Bai, Jinfeng; Zhang, Lei; Zuo, Wangmeng

doi:10.1109/iccv51070.2023.01461

articleOct 1, 2023Closed access

ELITE: Encoding Visual Concepts into Textual Embeddings for Customized Text-to-Image Generation

YWYuxiang Wei YZYabo Zhang ZJZhilong Ji JBJinfeng Bai LZLei Zhang

Harbin Institute of Technology · Hong Kong Polytechnic University · +2 more institutions

Indexed incrossref

Abstract

In addition to the unprecedented ability in imaginary creation, large text-to-image models are expected to take customized concepts in image generation. Existing works generally learn such concepts in an optimization-based manner, yet bringing excessive computation or memory burden. In this paper, we instead propose a learning-based encoder, which consists of a global and a local mapping networks for fast and accurate customized text-to-image generation. In specific, the global mapping network projects the hierarchical features of a given image into multiple "new" words in the textual word embedding space, i.e., one primary word for well-editable concept and other auxiliary words to exclude irrelevant…

Citation impact

208

total citations

FWCI: 23.68
Percentile: 100%
References: 59

Citations per year

Authors

6

Topics & keywords

Topics

Keywords

Computer science
Encoder
Encoding (memory)
Embedding
Word embedding
Image (mathematics)
Word (group theory)
Artificial intelligence

No related works found for this paper.

Funding

NN
National Natural Science Foundation of China