🤖 AI Summary
Existing autonomous driving safety evaluation relies on predefined threat patterns, limiting its ability to uncover diverse and unforeseen failure scenarios. To address this, we propose ScenGE—a novel framework that pioneers the use of large language models (LLMs) for meta-scenario reasoning, integrating structured driving knowledge modeling with executable scenario code generation. ScenGE constructs an adversarial collaborator graph to drive background vehicle co-evolution, dynamically constraining the ego-vehicle’s navigable space and introducing critical occlusions. It further combines graph neural network–based optimization with multi-agent co-evolution to enable cross-platform deployment and LLM-driven testing of autonomous driving systems. Experiments demonstrate that ScenGE identifies 31.96% more severe collision cases on average across multiple reinforcement learning models. Post-adversarial training, model robustness improves significantly. Real-world vehicle tests and human evaluations validate both the plausibility and hazard severity of generated scenarios.
📝 Abstract
The generation of safety-critical scenarios in simulation has become increasingly crucial for safety evaluation in autonomous vehicles prior to road deployment in society. However, current approaches largely rely on predefined threat patterns or rule-based strategies, which limit their ability to expose diverse and unforeseen failure modes. To overcome these, we propose ScenGE, a framework that can generate plentiful safety-critical scenarios by reasoning novel adversarial cases and then amplifying them with complex traffic flows. Given a simple prompt of a benign scene, it first performs Meta-Scenario Generation, where a large language model, grounded in structured driving knowledge, infers an adversarial agent whose behavior poses a threat that is both plausible and deliberately challenging. This meta-scenario is then specified in executable code for precise in-simulator control. Subsequently, Complex Scenario Evolution uses background vehicles to amplify the core threat introduced by Meta-Scenario. It builds an adversarial collaborator graph to identify key agent trajectories for optimization. These perturbations are designed to simultaneously reduce the ego vehicle's maneuvering space and create critical occlusions. Extensive experiments conducted on multiple reinforcement learning based AV models show that ScenGE uncovers more severe collision cases (+31.96%) on average than SoTA baselines. Additionally, our ScenGE can be applied to large model based AV systems and deployed on different simulators; we further observe that adversarial training on our scenarios improves the model robustness. Finally, we validate our framework through real-world vehicle tests and human evaluation, confirming that the generated scenarios are both plausible and critical. We hope our paper can build up a critical step towards building public trust and ensuring their safe deployment.