🤖 AI Summary
Existing open-vocabulary multi-object tracking (OV-MOT) methods predominantly rely on instance-level detection and association, neglecting trajectory-level information—leading to unstable associations under occlusion and weak classification generalization in ambiguous categories. This work pioneers systematic trajectory consistency modeling and semantic enhancement for OV-MOT. We propose the Trajectory Consistency Reinforcement (TCR) strategy to ensure robust temporal association, and introduce TraCLIP—a trajectory classification module that jointly leverages visual trajectory feature aggregation (TFA) and temporal language semantic enrichment (TSE) for dual-perspective trajectory utilization. Built upon the CLIP framework, our approach integrates trajectory-aware temporal modeling, consistency regularization, and open-vocabulary semantic alignment. Evaluated on the OV-TAO benchmark, it achieves significant improvements in HOTA, MOTA, and other key metrics, empirically validating the critical role of trajectory information in OV-MOT. The code is publicly available.
📝 Abstract
Open-Vocabulary Multi-Object Tracking (OV-MOT) aims to enable approaches to track objects without being limited to a predefined set of categories. Current OV-MOT methods typically rely primarily on instance-level detection and association, often overlooking trajectory information that is unique and essential for object tracking tasks. Utilizing trajectory information can enhance association stability and classification accuracy, especially in cases of occlusion and category ambiguity, thereby improving adaptability to novel classes. Thus motivated, in this paper we propose extbf{TRACT}, an open-vocabulary tracker that leverages trajectory information to improve both object association and classification in OV-MOT. Specifically, we introduce a extit{Trajectory Consistency Reinforcement} ( extbf{TCR}) strategy, that benefits tracking performance by improving target identity and category consistency. In addition, we present extbf{TraCLIP}, a plug-and-play trajectory classification module. It integrates extit{Trajectory Feature Aggregation} ( extbf{TFA}) and extit{Trajectory Semantic Enrichment} ( extbf{TSE}) strategies to fully leverage trajectory information from visual and language perspectives for enhancing the classification results. Extensive experiments on OV-TAO show that our TRACT significantly improves tracking performance, highlighting trajectory information as a valuable asset for OV-MOT. Code will be released.