Self-Supervised Pre-Training of Swin Transformers for 3D Medical Image Analysis

Vanderbilt University

Indexed incrossref

Abstract

Vision Transformers (ViT)s have shown great performance in self-supervised learning of global and local representations that can be transferred to downstream applications. Inspired by these results, we introduce a novel self-supervised learning framework with tailored proxy tasks for medical image analysis. Specifically, we propose: (i) a new 3D transformer-based model, dubbed Swin UNEt TRansformers (Swin UNETR), with a hierarchical encoder for self-supervised pretraining; (ii) tailored proxy tasks for learning the underlying pattern of human anatomy. We demonstrate successful pre-training of the proposed model on 5,050 publicly available computed tomography (CT) images from various body organs. The…

Citation impact

751
total citations
FWCI
39.60
Percentile
100%
References
84
Citations per year

Authors

8

Topics & keywords

Keywords
  • Artificial intelligence
  • Computer science
  • Segmentation
  • Transformer
  • Encoder
  • Machine learning
  • Engineering
No related works found for this paper.