🤖 AI Summary
This work addresses the longstanding challenge in multi-object tracking (MOT) of jointly optimizing detection accuracy, identity preservation, and spatiotemporal consistency. The authors propose a plug-and-play, differentiable graph-theoretic loss function that, for the first time, unifies these three objectives into a single end-to-end differentiable training target. Notably, this approach requires no architectural modifications to existing MOT frameworks and can be seamlessly integrated into mainstream tracking pipelines through differentiable graph representation learning. Extensive experiments demonstrate consistent performance gains across multiple benchmark models and datasets: identity switches are reduced by up to 53%, IDF1 scores improve by as much as 12%, and on SportsMOT, the method boosts GTR’s MOTA by 9.7%.
📝 Abstract
We present UniTrack, a plug-and-play graph-theoretic loss function designed to significantly enhance multi-object tracking (MOT) performance by directly optimizing tracking-specific objectives through unified differentiable learning. Unlike prior graph-based MOT methods that redesign tracking architectures, UniTrack provides a universal training objective that integrates detection accuracy, identity preservation, and spatiotemporal consistency into a single end-to-end trainable loss function, enabling seamless integration with existing MOT systems without architectural modifications. Through differentiable graph representation learning, UniTrack enables networks to learn holistic representations of motion continuity and identity relationships across frames. We validate UniTrack across diverse tracking models and multiple challenging benchmarks, demonstrating consistent improvements across all tested architectures and datasets including Trackformer, MOTR, FairMOT, ByteTrack, GTR, and MOTE. Extensive evaluations show up to 53\% reduction in identity switches and 12\% IDF1 improvements across challenging benchmarks, with GTR achieving peak performance gains of 9.7\% MOTA on SportsMOT.