Causal-Entity Reflected Egocentric Traffic Accident Video Synthesis

📅 2025-06-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the challenge of enhancing autonomous driving systems’ causal understanding and safety testing capabilities for high-risk scenarios by integrating real-world traffic accident causality into egocentric synthetic videos. To this end, we propose Causal-VidSyn—the first causal-aware video diffusion model that jointly incorporates accident cause descriptions and driver gaze cues—and introduce Drive-Gaze, a large-scale driving gaze dataset. Causal-VidSyn features three synergistic components: a causal entity localization module, a gaze-conditioned selection module, and an accident cause question-answering module, enabling fine-grained causal control. Experiments demonstrate significant improvements over state-of-the-art methods in both video fidelity and causal sensitivity. The framework supports three key tasks: accident video editing, normal-to-accident video generation, and text-to-video synthesis. Collectively, this work establishes a novel paradigm for causality-driven robustness evaluation of autonomous driving systems.

Technology Category

Application Category

📝 Abstract
Egocentricly comprehending the causes and effects of car accidents is crucial for the safety of self-driving cars, and synthesizing causal-entity reflected accident videos can facilitate the capability test to respond to unaffordable accidents in reality. However, incorporating causal relations as seen in real-world videos into synthetic videos remains challenging. This work argues that precisely identifying the accident participants and capturing their related behaviors are of critical importance. In this regard, we propose a novel diffusion model, Causal-VidSyn, for synthesizing egocentric traffic accident videos. To enable causal entity grounding in video diffusion, Causal-VidSyn leverages the cause descriptions and driver fixations to identify the accident participants and behaviors, facilitated by accident reason answering and gaze-conditioned selection modules. To support Causal-VidSyn, we further construct Drive-Gaze, the largest driver gaze dataset (with 1.54M frames of fixations) in driving accident scenarios. Extensive experiments show that Causal-VidSyn surpasses state-of-the-art video diffusion models in terms of frame quality and causal sensitivity in various tasks, including accident video editing, normal-to-accident video diffusion, and text-to-video generation.
Problem

Research questions and friction points this paper is trying to address.

Synthesizing causal-entity reflected accident videos for self-driving safety tests
Incorporating real-world causal relations into synthetic accident videos
Identifying accident participants and behaviors for accurate video synthesis
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses diffusion model for accident video synthesis
Leverages cause descriptions and driver fixations
Incorporates accident reason answering modules
🔎 Similar Papers
2024-07-082024 IEEE International Automated Vehicle Validation Conference (IAVVC)Citations: 1