Toward Multimodal Image-to-Image Translation

Zhu, Jun-Yan; Zhang, Richard; Pathak, Deepak; Darrell, Trevor; Efros, Alexei A.; Wang, Oliver; Shechtman, Eli

doi:10.48550/arxiv.1711.11586

preprintarXiv (Cornell University)Nov 30, 2017GREEN OA

Toward Multimodal Image-to-Image Translation

JZJun-Yan Zhu RZRichard Zhang DPDeepak Pathak TDTrevor Darrell AAAlexei A. Efros

University of California, Berkeley · Adobe Systems (United States)

Indexed inarxivdatacite

Abstract

Many image-to-image translation problems are ambiguous, as a single input image may correspond to multiple possible outputs. In this work, we aim to model a \emph{distribution} of possible outputs in a conditional generative modeling setting. The ambiguity of the mapping is distilled in a low-dimensional latent vector, which can be randomly sampled at test time. A generator learns to map the given input, combined with this latent code, to the output. We explicitly encourage the connection between output and the latent code to be invertible. This helps prevent a many-to-one mapping from the latent code to the output during training, also known as the problem of mode collapse, and produces more diverse results.…

Citation impact

742

total citations

FWCI: —
Percentile: —
References: 57

Citations per year

Authors

7

Topics & keywords

Topics

Keywords

Computer science
Image (mathematics)
Code (set theory)
Artificial intelligence
Translation (biology)
Encoding (memory)
Generator (circuit theory)
Consistency (knowledge bases)

No related works found for this paper.