Seg2Track-SAM2: SAM2-based Multi-object Tracking and Segmentation for Zero-shot Generalization

📅 2025-09-15
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address weak identity consistency and high memory overhead in multi-object tracking and segmentation (MOTS) for dynamic scenes, this paper proposes a fine-tuning-free zero-shot framework. It couples a pre-trained detector with SAM2 and introduces a lightweight Seg2Track module to jointly handle trajectory initialization, association, and enhancement—significantly improving ID continuity. A sliding-window memory strategy is further adopted, reducing GPU memory consumption by 75%. The method is detector-agnostic and generalizable to video instance segmentation. Evaluated on the KITTI MOTS benchmark, it ranks fourth overall and achieves state-of-the-art performance on both vehicle and pedestrian classes, with substantial gains in association accuracy (AssA). These results demonstrate its computational efficiency and strong generalization capability across object categories and detection backbones.

Technology Category

Application Category

📝 Abstract
Autonomous systems require robust Multi-Object Tracking (MOT) capabilities to operate reliably in dynamic environments. MOT ensures consistent object identity assignment and precise spatial delineation. Recent advances in foundation models, such as SAM2, have demonstrated strong zero-shot generalization for video segmentation, but their direct application to MOTS (MOT+Segmentation) remains limited by insufficient identity management and memory efficiency. This work introduces Seg2Track-SAM2, a framework that integrates pre-trained object detectors with SAM2 and a novel Seg2Track module to address track initialization, track management, and reinforcement. The proposed approach requires no fine-tuning and remains detector-agnostic. Experimental results on KITTI MOT and KITTI MOTS benchmarks show that Seg2Track-SAM2 achieves state-of-the-art (SOTA) performance, ranking fourth overall in both car and pedestrian classes on KITTI MOTS, while establishing a new benchmark in association accuracy (AssA). Furthermore, a sliding-window memory strategy reduces memory usage by up to 75% with negligible performance degradation, supporting deployment under resource constraints. These results confirm that Seg2Track-SAM2 advances MOTS by combining robust zero-shot tracking, enhanced identity preservation, and efficient memory utilization. The code is available at https://github.com/hcmr-lab/Seg2Track-SAM2
Problem

Research questions and friction points this paper is trying to address.

Integrating SAM2 for zero-shot multi-object tracking and segmentation
Addressing identity management and memory efficiency limitations in MOTS
Achieving robust tracking without fine-tuning or detector dependency
Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates SAM2 with object detectors
Uses Seg2Track module for identity management
Implements sliding-window memory strategy
🔎 Similar Papers
No similar papers found.
D
Diogo Mendonça
University of Coimbra, Institute of Systems and Robotics, Department of Electrical and Computer Engineering, Portugal
Tiago Barros
Tiago Barros
PhD, Researcher @University of Coimbra
3D PerceptionRoboticsDeep Learning
Cristiano Premebida
Cristiano Premebida
University of Coimbra, DEEC, ISR-UC
Robotic perceptionMachine learningMobile roboticsAutonomous vehiclesAgricultural robotics
U
Urbano J. Nunes
University of Coimbra, Institute of Systems and Robotics, Department of Electrical and Computer Engineering, Portugal