Adapting SAM 2 for Visual Object Tracking: 1st Place Solution for MMVPR Challenge Multi-Modal Tracking

📅 2025-05-23

📈 Citations: 0

✨ Influential: 0

career value

190K/year

🤖 AI Summary

This work addresses two key challenges in visual object tracking (VOT): weak long-term temporal consistency and insufficient cross-modal fusion. To this end, it introduces Segment Anything Model 2 (SAM2) to VOT for the first time. The proposed method features a dynamic memory prompting mechanism that enables online temporal prompt updating and cross-modal feature alignment. It further integrates contrastive learning-guided mask refinement with multimodal temporal memory enhancement to significantly improve tracking robustness and accuracy. By preserving SAM2’s strong generalization capability, the approach effectively models target evolution and modality complementarity. Evaluated on the 2024 ICPR Multimodal Object Tracking Challenge, the method achieves state-of-the-art performance with an AUC of 89.4%, substantially outperforming baseline approaches.

Technology Category

Application Category

📝 Abstract

We present an effective approach for adapting the Segment Anything Model 2 (SAM2) to the Visual Object Tracking (VOT) task. Our method leverages the powerful pre-trained capabilities of SAM2 and incorporates several key techniques to enhance its performance in VOT applications. By combining SAM2 with our proposed optimizations, we achieved a first place AUC score of 89.4 on the 2024 ICPR Multi-modal Object Tracking challenge, demonstrating the effectiveness of our approach. This paper details our methodology, the specific enhancements made to SAM2, and a comprehensive analysis of our results in the context of VOT solutions along with the multi-modality aspect of the dataset.

Problem

Research questions and friction points this paper is trying to address.

Adapt SAM2 for Visual Object Tracking task

Enhance SAM2 performance with key techniques

Achieve top score in multi-modal tracking challenge

Innovation

Methods, ideas, or system contributions that make the work stand out.

Adapt SAM2 for Visual Object Tracking

Leverage pre-trained SAM2 capabilities

Enhance SAM2 with key optimizations

🔎 Similar Papers

No similar papers found.