🤖 AI Summary
Multi-object tracking (MOT) has long suffered from a fundamental dichotomy between online and offline paradigms, hindering simultaneous achievement of real-time inference and long-term trajectory consistency. This paper proposes the first unified, flexible temporal tracking framework that seamlessly supports both online and offline operation. Our approach partitions video sequences into non-overlapping subclips, models local trajectories via graph neural networks (GNNs), and introduces an autoregressive long-term tracking (ALT) layer to enable cross-clip autoregressive fusion—supporting arbitrary temporal windows, variable inference latency, and configurable context length. Evaluated on DanceTrack, SportsMOT, and MOT20, our method improves AssA by +2.3, +9.2, and +5.0 percentage points in online mode, respectively; gains are even more substantial offline. To our knowledge, this is the first end-to-end, temporally scalable MOT framework achieving true unification of online and offline tracking.
📝 Abstract
The long-standing division between extit{online} and extit{offline} Multi-Object Tracking (MOT) has led to fragmented solutions that fail to address the flexible temporal requirements of real-world deployment scenarios. Current extit{online} trackers rely on frame-by-frame hand-crafted association strategies and struggle with long-term occlusions, whereas extit{offline} approaches can cover larger time gaps, but still rely on heuristic stitching for arbitrarily long sequences. In this paper, we introduce NOOUGAT, the first tracker designed to operate with arbitrary temporal horizons. NOOUGAT leverages a unified Graph Neural Network (GNN) framework that processes non-overlapping subclips, and fuses them through a novel Autoregressive Long-term Tracking (ALT) layer. The subclip size controls the trade-off between latency and temporal context, enabling a wide range of deployment scenarios, from frame-by-frame to batch processing. NOOUGAT achieves state-of-the-art performance across both tracking regimes, improving extit{online} AssA by +2.3 on DanceTrack, +9.2 on SportsMOT, and +5.0 on MOT20, with even greater gains in extit{offline} mode.