EfficientSAM: Leveraged Masked Image Pretraining for Efficient Segment Anything

Xiong, Yunyang; Varadarajan, Bala; Wu, Lemeng; Xiang, Xiaoyu; Xiao, Fanyi; Zhu, Chenchen; Dai, Xiaoliang; Wang, Dilin; Sun, Fei; Iandola, Forrest; Krishnamoorthi, Raghuraman; Chandra, Vikas

doi:10.1109/cvpr52733.2024.01525

articleJun 16, 2024Closed access

EfficientSAM: Leveraged Masked Image Pretraining for Efficient Segment Anything

YXYunyang Xiong BVBala Varadarajan LWLemeng Wu XXXiaoyu Xiang FXFanyi Xiao

Indexed incrossref

Abstract

Segment Anything Model (SAM) has emerged as a powerful tool for numerous vision applications. A key component that drives the impressive performance for zero-shot trans-fer and high versatility is a super large Transformer model trained on the extensive high-quality SA -1 B dataset. While beneficial, the huge computation cost of SAM model has limited its applications to wider real-world applications. To address this limitation, we propose EfficientSAMs, light-weight SAM models that exhibits decent performance with largely reduced complexity. Our idea is based on leveraging masked image pretraining, SAMI, which learns to reconstruct features from SAM image encoder for effective visual representation learning.…

Citation impact

189

total citations

FWCI: 43.03
Percentile: 100%
References: 94

Citations per year

Authors

12

Topics & keywords

Topics

Keywords

Computer science
Image (mathematics)
Artificial intelligence
Computer vision

No related works found for this paper.