🤖 AI Summary
This work addresses the challenge of jointly modeling motion uncertainty, long-term temporal dependencies, and physical constraints in autonomous driving trajectory prediction. To this end, we propose a diffusion-based Transformer architecture that integrates Mamba with attention mechanisms and incorporates an energy-based Port-Hamiltonian neural network to explicitly embed physical priors. Our method efficiently fuses sensor streams and historical trajectories within the diffusion process, enabling physically consistent multimodal trajectory generation. Evaluated on standard autonomous driving benchmarks, the model significantly outperforms existing approaches in prediction accuracy, physical plausibility, and robustness, thereby demonstrating the effectiveness and interpretability of the proposed architecture.
📝 Abstract
Autonomous driving systems demand trajectory planners that not only model the inherent uncertainty of future motions but also respect complex temporal dependencies and underlying physical laws. While diffusion-based generative models excel at capturing multi-modal distributions, they often fail to incorporate long-term sequential contexts and domain-specific physical priors. In this work, we bridge these gaps with two key innovations. First, we introduce a Diffusion Mamba Transformer architecture that embeds mamba and attention into the diffusion process, enabling more effective aggregation of sequential input contexts from sensor streams and past motion histories. Second, we design a Port-Hamiltonian Neural Network module that seamlessly integrates energy-based physical constraints into the diffusion model, thereby enhancing trajectory predictions with both consistency and interpretability. Extensive evaluations on standard autonomous driving benchmarks demonstrate that our unified framework significantly outperforms state-of-the-art baselines in predictive accuracy, physical plausibility, and robustness, thereby advancing safe and reliable motion planning.