SAM3-Adapter: Efficient Adaptation of Segment Anything 3 for Camouflage Object Segmentation, Shadow Detection, and Medical Image Segmentation

๐Ÿ“… 2025-11-24
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Existing foundation models (e.g., SAM/SAM2) exhibit limited generalization and insufficient accuracy on fine-grained low-level vision tasks such as camouflaged object detection, shadow detection, and medical image segmentation. To address this, we propose SAM3-Adapterโ€”the first lightweight adapter framework tailored for Segment Anything 3 (SAM3)โ€”which enhances fine structural modeling while preserving SAM3โ€™s strong generalization capability. Our approach introduces modular, parameter-efficient adapters coupled with task-aware training strategies, enabling unified multi-task segmentation with minimal computational overhead and flexible deployment. Evaluated on four challenging fine-grained segmentation benchmarks, SAM3-Adapter consistently outperforms SAM/SAM2 and their existing adapter variants, achieving state-of-the-art performance. Comprehensive experiments demonstrate its superior accuracy, robustness, and efficiency, validating its effectiveness for demanding low-level vision applications.

Technology Category

Application Category

๐Ÿ“ Abstract
The rapid rise of large-scale foundation models has reshaped the landscape of image segmentation, with models such as Segment Anything achieving unprecedented versatility across diverse vision tasks. However, previous generations-including SAM and its successor-still struggle with fine-grained, low-level segmentation challenges such as camouflaged object detection, medical image segmentation, cell image segmentation, and shadow detection. To address these limitations, we originally proposed SAM-Adapter in 2023, demonstrating substantial gains on these difficult scenarios. With the emergence of Segment Anything 3 (SAM3)-a more efficient and higher-performing evolution with a redesigned architecture and improved training pipeline-we revisit these long-standing challenges. In this work, we present SAM3-Adapter, the first adapter framework tailored for SAM3 that unlocks its full segmentation capability. SAM3-Adapter not only reduces computational overhead but also consistently surpasses both SAM and SAM2-based solutions, establishing new state-of-the-art results across multiple downstream tasks, including medical imaging, camouflaged (concealed) object segmentation, and shadow detection. Built upon the modular and composable design philosophy of the original SAM-Adapter, SAM3-Adapter provides stronger generalizability, richer task adaptability, and significantly improved segmentation precision. Extensive experiments confirm that integrating SAM3 with our adapter yields superior accuracy, robustness, and efficiency compared to all prior SAM-based adaptations. We hope SAM3-Adapter can serve as a foundation for future research and practical segmentation applications. Code, pre-trained models, and data processing pipelines are available.
Problem

Research questions and friction points this paper is trying to address.

Enhancing SAM3 for fine-grained segmentation tasks like camouflaged objects
Improving segmentation accuracy in medical imaging and shadow detection
Reducing computational overhead while boosting model adaptability and precision
Innovation

Methods, ideas, or system contributions that make the work stand out.

Adapter framework tailored for Segment Anything 3
Reduces computational overhead while improving performance
Enhances segmentation precision across multiple downstream tasks
๐Ÿ”Ž Similar Papers
No similar papers found.
Tianrun Chen
Tianrun Chen
Zhejiang University
Computer Vision3D ReconstructionComputational ImagingLarge Vision-Language Model
R
Runlong Cao
School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, P.R. China.
X
Xinda Yu
School of Information Engineering, Huzhou University, Huzhou, Huzhou, P.R. China.
Lanyun Zhu
Lanyun Zhu
NTU, CityUHK, SUTD, BUAA
Multimodal LearningComputer VisionResource-efficient LearningLarge Vision-Language Model
C
Chaotao Ding
KOKONI, Moxin (Huzhou) Tech. Co., LTD, Huzhou, Zhejiang, P.R. China.
Deyi Ji
Deyi Ji
Tencent; USTC Ph.D.
Multimodal LLMComputer VisionNLP
C
Cheng Chen
College of Computing and Data Science, Nanyang Technological University, Singapore.
Q
Qi Zhu
School of Information Science and Technology, University of Science and Technology of China, P.R. China.
C
Chunyan Xu
School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, P.R. China.
P
Papa Mao
KOKONI, Moxin (Huzhou) Tech. Co., LTD, Huzhou, Zhejiang, P.R. China.
Y
Ying Zang
School of Information Engineering, Huzhou University, Huzhou, Huzhou, P.R. China.