Composing Driving Worlds through Disentangled Control for Adversarial Scenario Generation

📅 2026-03-13

📈 Citations: 0

✨ Influential: 0

career value

212K/year

🤖 AI Summary

This work addresses the challenge of synthesizing safety-critical edge cases in autonomous driving, which often arise from anomalous combinations of common traffic elements. To this end, the authors propose CompoSIA, a composable driving video simulator that decouples scene structure, object identity, and ego-vehicle motion. By integrating a noise-level identity injection mechanism with a hierarchical dual-branch motion control strategy, CompoSIA achieves, for the first time, pose-invariant identity replacement and high-fidelity motion manipulation. The method enables systematic generation of hazardous yet plausible adversarial scenarios, improving the FVD metric for identity editing by 17% and reducing rotational and translational errors in motion control by 30% and 47%, respectively. In downstream stress testing, it increases the average collision rate within three seconds by 173%.

Technology Category

Application Category

📝 Abstract

A major challenge in autonomous driving is the "long tail" of safety-critical edge cases, which often emerge from unusual combinations of common traffic elements. Synthesizing these scenarios is crucial, yet current controllable generative models provide incomplete or entangled guidance, preventing the independent manipulation of scene structure, object identity, and ego actions. We introduce CompoSIA, a compositional driving video simulator that disentangles these traffic factors, enabling fine-grained control over diverse adversarial driving scenarios. To support controllable identity replacement of scene elements, we propose a noise-level identity injection, allowing pose-agnostic identity generation across diverse element poses, all from a single reference image. Furthermore, a hierarchical dual-branch action control mechanism is introduced to improve action controllability. Such disentangled control enables adversarial scenario synthesis-systematically combining safe elements into dangerous configurations that entangled generators cannot produce. Extensive comparisons demonstrate superior controllable generation quality over state-of-the-art baselines, with a 17% improvement in FVD for identity editing and reductions of 30% and 47% in rotation and translation errors for action control. Furthermore, downstream stress-testing reveals substantial planner failures: across editing modalities, the average collision rate of 3s increases by 173%.

Problem

Research questions and friction points this paper is trying to address.

autonomous driving

adversarial scenario generation

disentangled control

long-tail safety

controllable generation

Innovation

Methods, ideas, or system contributions that make the work stand out.

disentangled control

adversarial scenario generation

identity injection