Depthwise-Dilated Convolutional Adapters for Medical Object Tracking and Segmentation Using the Segment Anything Model 2

📅 2025-07-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Medical video segmentation suffers from poor generalization, heavy data dependency, and catastrophic forgetting. To address these challenges, this paper proposes DD-SAM2—a novel framework that pioneers the integration of adapter mechanisms into SAM2 for medical video analysis. Specifically, we design a Depthwise-Dilated Adapter (DD-Adapter) based on depthwise separable dilated convolutions to enhance multi-scale feature extraction, and seamlessly couple it with SAM2’s streaming memory architecture to enable accurate few-shot segmentation and tracking of consecutive frames. Our approach adopts parameter-efficient fine-tuning, substantially reducing computational overhead and mitigating model update risks. Evaluated on TrackRad2025 and EchoNet-Dynamic, DD-SAM2 achieves Dice scores of 0.93 and 0.97, respectively—outperforming state-of-the-art methods. These results demonstrate its superiority and practicality for few-shot medical video segmentation tasks.

Technology Category

Application Category

📝 Abstract
Recent advances in medical image segmentation have been driven by deep learning; however, most existing methods remain limited by modality-specific designs and exhibit poor adaptability to dynamic medical imaging scenarios. The Segment Anything Model 2 (SAM2) and its related variants, which introduce a streaming memory mechanism for real-time video segmentation, present new opportunities for prompt-based, generalizable solutions. Nevertheless, adapting these models to medical video scenarios typically requires large-scale datasets for retraining or transfer learning, leading to high computational costs and the risk of catastrophic forgetting. To address these challenges, we propose DD-SAM2, an efficient adaptation framework for SAM2 that incorporates a Depthwise-Dilated Adapter (DD-Adapter) to enhance multi-scale feature extraction with minimal parameter overhead. This design enables effective fine-tuning of SAM2 on medical videos with limited training data. Unlike existing adapter-based methods focused solely on static images, DD-SAM2 fully exploits SAM2's streaming memory for medical video object tracking and segmentation. Comprehensive evaluations on TrackRad2025 (tumor segmentation) and EchoNet-Dynamic (left ventricle tracking) datasets demonstrate superior performance, achieving Dice scores of 0.93 and 0.97, respectively. To the best of our knowledge, this work provides an initial attempt at systematically exploring adapter-based SAM2 fine-tuning for medical video segmentation and tracking. Code, datasets, and models will be publicly available at https://github.com/apple1986/DD-SAM2.
Problem

Research questions and friction points this paper is trying to address.

Adapting SAM2 for medical video tracking and segmentation
Reducing computational costs in medical image processing
Enhancing feature extraction with minimal parameter overhead
Innovation

Methods, ideas, or system contributions that make the work stand out.

Depthwise-Dilated Adapter enhances feature extraction
Efficient SAM2 fine-tuning with minimal parameters
Exploits streaming memory for medical video tasks
🔎 Similar Papers
No similar papers found.
Guoping Xu
Guoping Xu
UTSW, WIT
Medical Image SegmentationDisease QuantificationComputer Vision
C
Christopher Kabat
The Medical Artificial Intelligence and Automation (MAIA) Laboratory, Department of Radiation Oncology, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA
Y
You Zhang
The Medical Artificial Intelligence and Automation (MAIA) Laboratory, Department of Radiation Oncology, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA