SAM3-DMS: Decoupled Memory Selection for Multi-target Video Segmentation of SAM3

📅 2026-01-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limitations of the group-level unified memory selection mechanism in SAM3 for multi-object video segmentation, which relies on average target performance and neglects individual reliability, leading to insufficient tracking stability in high-density scenarios. To overcome this, we propose a training-free, object-level decoupled memory selection strategy that enables fine-grained, independent memory management for each target within the SAM3 framework, thereby eliminating the constraints of synchronized decision-making. The proposed method significantly enhances identity preservation and tracking robustness, with performance gains becoming more pronounced as target density increases.

Technology Category

Application Category

📝 Abstract
Segment Anything 3 (SAM3) has established a powerful foundation that robustly detects, segments, and tracks specified targets in videos. However, in its original implementation, its group-level collective memory selection is suboptimal for complex multi-object scenarios, as it employs a synchronized decision across all concurrent targets conditioned on their average performance, often overlooking individual reliability. To this end, we propose SAM3-DMS, a training-free decoupled strategy that utilizes fine-grained memory selection on individual objects. Experiments demonstrate that our approach achieves robust identity preservation and tracking stability. Notably, our advantage becomes more pronounced with increased target density, establishing a solid foundation for simultaneous multi-target video segmentation in the wild.
Problem

Research questions and friction points this paper is trying to address.

multi-target video segmentation
memory selection
object tracking
Segment Anything Model
identity preservation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Decoupled Memory Selection
Multi-target Video Segmentation
SAM3
Fine-grained Memory
Training-free Strategy
R
Rui-Yang Shen
Fudan University
C
Chang Liu
Shanghai University of Finance and Economics
Henghui Ding
Henghui Ding
Fudan University
Computer VisionMachine LearningSegmentationAIGC