π€ AI Summary
To address the robustness deficiency of end-to-end autonomous driving planners under long-tail scenarios, this paper proposes the first world-model-based fully automated self-correction system. Methodologically, it introduces (1) PM-Agentβa novel agent that autonomously generates structured data requirements to establish a closed-loop correction pipeline; (2) DriveSora, the first diffusion-based video generation model aligned with 3D scene layouts and ensuring spatiotemporal consistency for high-fidelity driving video synthesis; and (3) a scalable data repair pipeline integrating diffusion video generation, 3D layout control, and agent-centric architecture. Evaluated on nuScenes and a proprietary dataset, the system corrects 62.5% and 49.8% of planner failure cases, respectively, while reducing collision rates by 39% and 27%. These results demonstrate substantial improvements in safety and generalization across diverse end-to-end planning architectures.
π Abstract
End-to-end planning methods are the de facto standard of the current autonomous driving system, while the robustness of the data-driven approaches suffers due to the notorious long-tail problem (i.e., rare but safety-critical failure cases). In this work, we explore whether recent diffusion-based video generation methods (a.k.a. world models), paired with structured 3D layouts, can enable a fully automated pipeline to self-correct such failure cases. We first introduce an agent to simulate the role of product manager, dubbed PM-Agent, which formulates data requirements to collect data similar to the failure cases. Then, we use a generative model that can simulate both data collection and annotation. However, existing generative models struggle to generate high-fidelity data conditioned on 3D layouts. To address this, we propose DriveSora, which can generate spatiotemporally consistent videos aligned with the 3D annotations requested by PM-Agent. We integrate these components into our self-correcting agentic system, CorrectAD. Importantly, our pipeline is an end-to-end model-agnostic and can be applied to improve any end-to-end planner. Evaluated on both nuScenes and a more challenging in-house dataset across multiple end-to-end planners, CorrectAD corrects 62.5% and 49.8% of failure cases, reducing collision rates by 39% and 27%, respectively.