Causal-Entity Reflected Egocentric Traffic Accident Video Synthesis

📅 2025-06-29

📈 Citations: 0

✨ Influential: 0

career value

207K/year

🤖 AI Summary

This study addresses the challenge of enhancing autonomous driving systems’ causal understanding and safety testing capabilities for high-risk scenarios by integrating real-world traffic accident causality into egocentric synthetic videos. To this end, we propose Causal-VidSyn—the first causal-aware video diffusion model that jointly incorporates accident cause descriptions and driver gaze cues—and introduce Drive-Gaze, a large-scale driving gaze dataset. Causal-VidSyn features three synergistic components: a causal entity localization module, a gaze-conditioned selection module, and an accident cause question-answering module, enabling fine-grained causal control. Experiments demonstrate significant improvements over state-of-the-art methods in both video fidelity and causal sensitivity. The framework supports three key tasks: accident video editing, normal-to-accident video generation, and text-to-video synthesis. Collectively, this work establishes a novel paradigm for causality-driven robustness evaluation of autonomous driving systems.

Technology Category

Application Category

📝 Abstract

Egocentricly comprehending the causes and effects of car accidents is crucial for the safety of self-driving cars, and synthesizing causal-entity reflected accident videos can facilitate the capability test to respond to unaffordable accidents in reality. However, incorporating causal relations as seen in real-world videos into synthetic videos remains challenging. This work argues that precisely identifying the accident participants and capturing their related behaviors are of critical importance. In this regard, we propose a novel diffusion model, Causal-VidSyn, for synthesizing egocentric traffic accident videos. To enable causal entity grounding in video diffusion, Causal-VidSyn leverages the cause descriptions and driver fixations to identify the accident participants and behaviors, facilitated by accident reason answering and gaze-conditioned selection modules. To support Causal-VidSyn, we further construct Drive-Gaze, the largest driver gaze dataset (with 1.54M frames of fixations) in driving accident scenarios. Extensive experiments show that Causal-VidSyn surpasses state-of-the-art video diffusion models in terms of frame quality and causal sensitivity in various tasks, including accident video editing, normal-to-accident video diffusion, and text-to-video generation.

Problem

Research questions and friction points this paper is trying to address.

Synthesizing causal-entity reflected accident videos for self-driving safety tests

Incorporating real-world causal relations into synthetic accident videos

Identifying accident participants and behaviors for accurate video synthesis

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses diffusion model for accident video synthesis

Leverages cause descriptions and driver fixations

Incorporates accident reason answering modules

🔎 Similar Papers

Enhancing Vision-Language Models with Scene Graphs for Traffic Accident Understanding