World Model-Based End-to-End Scene Generation for Accident Anticipation in Autonomous Driving

📅 2025-07-16

📈 Citations: 0

✨ Influential: 0

career value

200K/year

🤖 AI Summary

Accident anticipation in autonomous driving faces two key challenges: scarcity of high-quality, diverse training data and frequent occlusion or sensor failure leading to missing critical objects. To address these, we propose an end-to-end framework that integrates a domain-guided world model to generate high-resolution, diverse driving scenarios—covering rare and hazardous edge cases—and a novel spatiotemporal reasoning module combining reinforcement-enhanced graph convolution with dilated temporal operators to improve robustness under partial observability. Our contributions are threefold: (1) the first domain-prompt-driven video generation method specifically designed for accident anticipation; (2) a new spatiotemporal modeling architecture enabling reliable prediction despite missing object cues; and (3) a newly released benchmark dataset featuring real-world hazardous driving scenarios. Extensive experiments on both public and our new benchmarks demonstrate significant improvements in anticipation accuracy and earlier warning times, effectively mitigating data scarcity and model fragility.

Technology Category

Application Category

📝 Abstract

Reliable anticipation of traffic accidents is essential for advancing autonomous driving systems. However, this objective is limited by two fundamental challenges: the scarcity of diverse, high-quality training data and the frequent absence of crucial object-level cues due to environmental disruptions or sensor deficiencies. To tackle these issues, we propose a comprehensive framework combining generative scene augmentation with adaptive temporal reasoning. Specifically, we develop a video generation pipeline that utilizes a world model guided by domain-informed prompts to create high-resolution, statistically consistent driving scenarios, particularly enriching the coverage of edge cases and complex interactions. In parallel, we construct a dynamic prediction model that encodes spatio-temporal relationships through strengthened graph convolutions and dilated temporal operators, effectively addressing data incompleteness and transient visual noise. Furthermore, we release a new benchmark dataset designed to better capture diverse real-world driving risks. Extensive experiments on public and newly released datasets confirm that our framework enhances both the accuracy and lead time of accident anticipation, offering a robust solution to current data and modeling limitations in safety-critical autonomous driving applications.

Problem

Research questions and friction points this paper is trying to address.

Addressing scarcity of diverse, high-quality training data for autonomous driving

Overcoming absence of crucial object-level cues due to environmental disruptions

Improving accuracy and lead time of accident anticipation in autonomous systems

Innovation

Methods, ideas, or system contributions that make the work stand out.

Generative scene augmentation with domain-informed prompts

Dynamic prediction model with graph convolutions

New benchmark dataset for diverse driving risks

🔎 Similar Papers

No similar papers found.