🤖 AI Summary
To address the high deployment cost and excessive GPU memory consumption of Segment Anything Model 3 (SAM3) in downstream tasks, this work proposes a lightweight adaptation framework. The method retains SAM3’s frozen image encoder and introduces parameter-efficient adapter modules, coupled with a streamlined U-Net–style decoder, significantly reducing computational and memory overhead. Evaluated on mirror detection and salient object detection, our framework achieves state-of-the-art performance—outperforming baselines such as SAM2-UNet—while requiring less than 6 GB GPU memory for training (batch size = 12), representing over a 60% reduction compared to full SAM3 fine-tuning. Our core contribution is the first efficient multi-task transfer adaptation of SAM3, achieving an optimal trade-off among accuracy, inference speed, and resource efficiency—thereby enabling practical deployment on edge devices.
📝 Abstract
In this paper, we introduce SAM3-UNet, a simplified variant of Segment Anything Model 3 (SAM3), designed to adapt SAM3 for downstream tasks at a low cost. Our SAM3-UNet consists of three components: a SAM3 image encoder, a simple adapter for parameter-efficient fine-tuning, and a lightweight U-Net-style decoder. Preliminary experiments on multiple tasks, such as mirror detection and salient object detection, demonstrate that the proposed SAM3-UNet outperforms the prior SAM2-UNet and other state-of-the-art methods, while requiring less than 6 GB of GPU memory during training with a batch size of 12. The code is publicly available at https://github.com/WZH0120/SAM3-UNet.