🤖 AI Summary
To address high redundancy, low computational efficiency, and poor scalability in multi-agent trajectory prediction for large-scale traffic scenarios, this paper proposes a lane-graph topology–aware sparse connectivity mechanism. Instead of conventional distance-threshold–based dense graph construction, our method leverages lane-topology priors to build semantically rich, highly sparse graphs with drastically reduced edge counts. We further design a lightweight graph encoder and a hierarchical interaction aggregation module—operating jointly over agent–map and agent–agent relations—to enable efficient representation learning. Evaluated on the Waymo Open Motion Dataset, our approach processes over 200 agents per frame in just 5 ms; joint inference over 5,000+ agents and 17,000+ lane segments takes only 54 ms, with memory consumption of merely 2.9 GB. The method achieves significant improvements in prediction accuracy, inference speed, and scalability—particularly under large-scale, real-world traffic conditions.
📝 Abstract
Multi-agent trajectory generation is a core problem for autonomous driving and intelligent transportation systems. However, efficiently modeling the dynamic interactions between numerous road users and infrastructures in complex scenes remains an open problem. Existing methods typically employ distance-based or fully connected dense graph structures to capture interaction information, which not only introduces a large number of redundant edges but also requires complex and heavily parameterized networks for encoding, thereby resulting in low training and inference efficiency, limiting scalability to large and complex traffic scenes. To overcome the limitations of existing methods, we propose SparScene, a sparse graph learning framework designed for efficient and scalable traffic scene representation. Instead of relying on distance thresholds, SparScene leverages the lane graph topology to construct structure-aware sparse connections between agents and lanes, enabling efficient yet informative scene graph representation. SparScene adopts a lightweight graph encoder that efficiently aggregates agent-map and agent-agent interactions, yielding compact scene representations with substantially improved efficiency and scalability. On the motion prediction benchmark of the Waymo Open Motion Dataset (WOMD), SparScene achieves competitive performance with remarkable efficiency. It generates trajectories for more than 200 agents in a scene within 5 ms and scales to more than 5,000 agents and 17,000 lanes with merely 54 ms of inference time with a GPU memory of 2.9 GB, highlighting its superior scalability for large-scale traffic scenes.