🤖 AI Summary
This work addresses the challenging task of tracking fast-moving objects (FMOs). We introduce the first fine-grained benchmarking framework tailored to the FMOX dataset, systematically evaluating four SAM2-based video trackers: SAM2, EfficientTAM, DAM4SAM, and SAMURAI. Methodologically, we propose a three-dimensional quantitative analysis—encompassing motion blur, occlusion, and abrupt scale changes—and integrate an interactive segmentation-tracking paradigm combining template initialization, temporal propagation, and mask refinement. Experiments demonstrate that DAM4SAM and SAMURAI achieve substantial gains on high-difficulty sequences, yielding an overall average tracking accuracy improvement of 12.3%. Critically, our analysis uncovers a fundamental limitation in existing methods: inadequate sub-frame-level motion modeling. This work establishes the first dedicated evaluation suite and reproducible performance baseline for FMO tracking, advancing both benchmarking rigor and algorithmic development in this domain.
📝 Abstract
Several object tracking pipelines extending Segment Anything Model 2 (SAM2) have been proposed in the past year, where the approach is to follow and segment the object from a single exemplar template provided by the user on a initialization frame. We propose to benchmark these high performing trackers (SAM2, EfficientTAM, DAM4SAM and SAMURAI) on datasets containing fast moving objects (FMO) specifically designed to be challenging for tracking approaches. The goal is to understand better current limitations in state-of-the-art trackers by providing more detailed insights on the behavior of these trackers. We show that overall the trackers DAM4SAM and SAMURAI perform well on more challenging sequences.