articleJan 1, 2022GOLD OA

MedCLIP: Contrastive Learning from Unpaired Medical Images and Text

University of Illinois Urbana-Champaign · Adobe Systems (United States) · +1 more institution

Indexed incrossref

Abstract

Existing vision-text contrastive learning like CLIP (Radford et al., 2021) aims to match the paired image and caption embeddings while pushing others apart, which improves representation transferability and supports zero-shot prediction. However, medical image-text datasets are orders of magnitude below the general images and captions from the internet. Moreover, previous methods encounter many false negatives, i.e., images and reports from separate patients probably carry the same semantics but are wrongly treated as negatives. In this paper, we decouple images and texts for multimodal contrastive learning thus scaling the usable training data in a combinatorial magnitude with low cost. We also propose to…

No related works found for this paper.

Funding