🤖 AI Summary
Multi-object tracking (MOT) faces challenges in jointly leveraging kinematic and re-identification features, and conventional trajectory inference methods are non-differentiable, hindering end-to-end learning.
Method: We propose an end-to-end differentiable MOT framework based on graph neural networks (GNNs). It constructs a temporal tracking graph integrating kinematic and appearance features, employs the differentiable shortest path (SSP) algorithm for trajectory association, and—novelly for graph-based MOT—adopts bilevel optimization to jointly learn GNN parameters and SSP solutions. A dedicated alignment loss explicitly guides SSP outputs toward ground-truth trajectories.
Results: Extensive experiments on diverse complex synthetic scenarios demonstrate significant improvements over strong baselines. The method exhibits superior robustness under detection noise, occlusion, and variations in hyperparameters, validating its generalizability and reliability.
📝 Abstract
We propose a graph-based tracking formulation for multi-object tracking (MOT) where target detections contain kinematic information and re-identification features (attributes). Our method applies a successive shortest paths (SSP) algorithm to a tracking graph defined over a batch of frames. The edge costs in this tracking graph are computed via message-passing network, a graph neural network (GNN) variant. The parameters of the GNN, and hence, the tracker, are learned end-to-end on a training set of example ground-truth tracks and detections. Specifically, learning takes the form of bilevel optimization guided by our novel loss function. We evaluate our algorithm on simulated scenarios to understand its sensitivity to scenario aspects and model hyperparameters. Across varied scenario complexities, our method compares favorably to a strong baseline.