MedCLIP: Contrastive Learning from Unpaired Medical Images and Text

Wang, Zifeng; Wu, Zhenbang; Agarwal, D.C.; Sun, Jimeng

doi:10.18653/v1/2022.emnlp-main.256

articleJan 1, 2022GOLD OA

MedCLIP: Contrastive Learning from Unpaired Medical Images and Text

ZWZifeng Wang ZWZhenbang Wu DAD.C. Agarwal JSJimeng Sun

University of Illinois Urbana-Champaign · Adobe Systems (United States) · +1 more institution

Indexed incrossref

Abstract

Existing vision-text contrastive learning like CLIP (Radford et al., 2021) aims to match the paired image and caption embeddings while pushing others apart, which improves representation transferability and supports zero-shot prediction. However, medical image-text datasets are orders of magnitude below the general images and captions from the internet. Moreover, previous methods encounter many false negatives, i.e., images and reports from separate patients probably carry the same semantics but are wrongly treated as negatives. In this paper, we decouple images and texts for multimodal contrastive learning thus scaling the usable training data in a combinatorial magnitude with low cost. We also propose to…

Citation impact

574

total citations

FWCI: 49.12
Percentile: 100%
References: 47

Citations per year

Authors

4

Topics & keywords

Topics

Keywords

Computer science
Artificial intelligence
Natural language processing
Information retrieval

No related works found for this paper.