An Open-Source Modular Benchmark for Diffusion-Based Motion Planning in Closed-Loop Autonomous Driving

📅 2026-03-01

📈 Citations: 0

✨ Influential: 0

career value

201K/year

🤖 AI Summary

This work addresses the limitations of existing diffusion-based motion planners in closed-loop autonomous driving systems, which often neglect ROS 2 communication latency and real-time scheduling constraints, and suffer from inflexible monolithic ONNX deployments. To overcome these issues, the authors decouple the diffusion planner into three modules implemented as ROS 2 nodes within Autoware, enabling runtime parameter reconfiguration and stepwise observability of the denoising process. The model is decomposed using ONNX GraphSurgeon, and the DPM-Solver++ denoising loop is reimplemented in C++ for integration into Autoware and the AWSIM closed-loop simulation environment. Experiments demonstrate that encoder caching reduces latency by 3.2×, and second-order solving (N=3) decreases Final Displacement Error (FDE) by 41% compared to first-order methods. This study presents the first modular, configurable, and observable deployment of a diffusion planner in a production-grade autonomous driving stack, supporting seamless migration from simulation to real vehicles.

Technology Category

Application Category

📝 Abstract

Diffusion-based motion planners have achieved state-of-the-art results on benchmarks such as nuPlan, yet their evaluation within closed-loop production autonomous driving stacks remains largely unexplored. Existing evaluations abstract away ROS 2 communication latency and real-time scheduling constraints, while monolithic ONNX deployment freezes all solver parameters at export time. We present an open-source modular benchmark that addresses both gaps: using ONNX GraphSurgeon, we decompose a monolithic 18,398 node diffusion planner into three independently executable modules and reimplement the DPM-Solver++ denoising loop in native C++. Integrated as a ROS 2 node within Autoware, the open-source AD stack deployed on real vehicles worldwide, the system enables runtime-configurable solver parameters without model recompilation and per-step observability of the denoising process, breaking the black box of monolithic deployment. Unlike evaluations in standalone simulators such as CARLA, our benchmark operates within a production-grade stack and is validated through AWSIM closed-loop simulation. Through systematic comparison of DPM-Solver++ (first- and second-order) and DDIM across six step-count configurations (N in {3, 5, 7, 10, 15, 20}), we show that encoder caching yields a 3.2x latency reduction, and that second-order solving reduces FDE by 41% at N=3 compared to first-order. The complete codebase will be released as open-source, providing a direct path from simulation benchmarks to real-vehicle deployment.

Problem

Research questions and friction points this paper is trying to address.

diffusion-based motion planning

closed-loop autonomous driving

modular benchmark

real-time constraints

solver parameter configurability

Innovation

Methods, ideas, or system contributions that make the work stand out.

modular diffusion planner

closed-loop autonomous driving

ONNX GraphSurgeon