MedCLIP: Contrastive Learning from Unpaired Medical Images and Text
University of Illinois Urbana-Champaign · Adobe Systems (United States) · +1 more institution
Abstract
Existing vision-text contrastive learning like CLIP (Radford et al., 2021) aims to match the paired image and caption embeddings while pushing others apart, which improves representation transferability and supports zero-shot prediction. However, medical image-text datasets are orders of magnitude below the general images and captions from the internet. Moreover, previous methods encounter many false negatives, i.e., images and reports from separate patients probably carry the same semantics but are wrongly treated as negatives. In this paper, we decouple images and texts for multimodal contrastive learning thus scaling the usable training data in a combinatorial magnitude with low cost. We also propose to…
Citation impact
- FWCI
- 49.12
- Percentile
- 100%
- References
- 47
Authors
4Topics & keywords
- Computer science
- Artificial intelligence
- Natural language processing
- Information retrieval