Distractor-Aware Memory-Based Visual Object Tracking

📅 2025-09-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing memory-based video segmentation models (e.g., SAM2) suffer from susceptibility to distractors in visual object tracking, leading to tracking drift and poor re-detection capability after occlusion. To address this, we propose a plug-and-play Interference-Aware Memory module (IAM) coupled with a self-reflective memory management mechanism, enhancing distractor discrimination and occlusion robustness. We introduce DiDi—the first benchmark dataset explicitly designed for distractor analysis—and develop a cross-architecture memory enhancement framework compatible with diverse trackers. Our method integrates seamlessly into efficient architectures such as EfficientTAM and EdgeTAM, outperforming SAM2.1 across all 13 evaluation benchmarks, achieving state-of-the-art results on 10. Notably, it yields substantial gains (+11% for EfficientTAM, +4% for EdgeTAM) while approaching the accuracy of non-real-time models.

Technology Category

Application Category

📝 Abstract
Recent emergence of memory-based video segmentation methods such as SAM2 has led to models with excellent performance in segmentation tasks, achieving leading results on numerous benchmarks. However, these modes are not fully adjusted for visual object tracking, where distractors (i.e., objects visually similar to the target) pose a key challenge. In this paper we propose a distractor-aware drop-in memory module and introspection-based management method for SAM2, leading to DAM4SAM. Our design effectively reduces the tracking drift toward distractors and improves redetection capability after object occlusion. To facilitate the analysis of tracking in the presence of distractors, we construct DiDi, a Distractor-Distilled dataset. DAM4SAM outperforms SAM2.1 on thirteen benchmarks and sets new state-of-the-art results on ten. Furthermore, integrating the proposed distractor-aware memory into a real-time tracker EfficientTAM leads to 11% improvement and matches tracking quality of the non-real-time SAM2.1-L on multiple tracking and segmentation benchmarks, while integration with edge-based tracker EdgeTAM delivers 4% performance boost, demonstrating a very good generalization across architectures.
Problem

Research questions and friction points this paper is trying to address.

Reducing tracking drift caused by distractors
Improving object redetection after occlusion
Enhancing visual object tracking performance
Innovation

Methods, ideas, or system contributions that make the work stand out.

Distractor-aware drop-in memory module
Introspection-based management method
DiDi dataset for analysis
🔎 Similar Papers
No similar papers found.
J
Jovana Videnovic
Faculty of Computer and Information Science, University of Ljubljana, Večna pot 113, Ljubljana, 1000, Slovenia
Matej Kristan
Matej Kristan
Full Professor at Faculty of computer and information science, University of Ljubljana
Computer visionMachine learningPattern recognition
A
Alan Lukezic
Faculty of Computer and Information Science, University of Ljubljana, Večna pot 113, Ljubljana, 1000, Slovenia