Re-Prompting SAM 3 via Object Retrieval: 3rd of the 5th PVUW MOSE Track

📅 2026-03-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenges of tracking robustness in semi-supervised video object segmentation caused by object disappearance and reappearance, severe deformations, and interference from similar-looking objects. To overcome these issues, we propose an automatic re-prompting framework built upon SAM 3. Our approach detects candidate objects of the same category in subsequent frames, performs object-level matching using DINOv3, and retrieves reliable anchor points from a transformation-aware feature pool. These anchors, together with the initial-frame mask, are jointly injected into the tracker to enable multi-anchor propagation. By moving beyond the conventional reliance on initial prompts alone, our method significantly improves segmentation performance under deformation, occlusion, and semantic interference, achieving a J&F score of 51.17% on the MOSEv2 test set and ranking third in the competition track.

Technology Category

Application Category

📝 Abstract
This technical report explores the MOSEv2 track of the PVUW 2026 Challenge, which targets complex semi-supervised video object segmentation. Built on SAM~3, we develop an automatic re-prompting framework to improve robustness under target disappearance and reappearance, severe transformation, and strong same-category distractors. Our method first applies the SAM~3 detector to later frames to identify same-category object candidates, and then performs DINOv3-based object-level matching with a transformation-aware target feature pool to retrieve reliable target anchors. These anchors are injected back into the SAM~3 tracker together with the first-frame mask, enabling multi-anchor propagation rather than relying solely on the initial prompt. This simple directly benefits several core challenges of MOSEv2. Our solution achieves a J&F of 51.17% on the test set, ranking 3rd in the MOSEv2 track.
Problem

Research questions and friction points this paper is trying to address.

video object segmentation
semi-supervised learning
object disappearance and reappearance
severe transformation
same-category distractors
Innovation

Methods, ideas, or system contributions that make the work stand out.

re-prompting
object retrieval
multi-anchor propagation
SAM3
video object segmentation
🔎 Similar Papers
No similar papers found.