π€ AI Summary
This work addresses the inefficiency of conventional autonomous driving safety testing, which relies heavily on extensive real-world road trials and struggles to effectively uncover rare yet critical safety-critical edge cases. The authors propose a novel approach that, for the first time, integrates differentiable signal temporal logic (STL) with a multi-agent trajectory diffusion model. By performing gradient-based optimization in the latent space, the method enables targeted generation of realistic traffic scenarios that simultaneously adhere to empirical data distributions and satisfy user-specified safety properties. This framework facilitates interpretable, efficient, and controllable synthesis of critical test scenarios, substantially enhancing the stress-testing capability of autonomous driving systems.
π Abstract
The rapid advancement of autonomous driving (AD) technologies has outpaced the development of robust safety evaluation methods. Conventional testing relies on exposing AD systems to vast numbers of real-world traffic scenes -- a brute-force approach that is prohibitively expensive and statistically ineffective at capturing the rare, safety-critical edge cases essential for validating real-world robustness. To address this fundamental limitation, we introduce STRELGen, a scalable framework for the targeted generation of safety-critical driving scenarios. STRELGen synergistically combines a multi-agent trajectory-generation diffusion model (DM) with Spatio-Temporal Logic (STREL) specifications that encode complex safety and realism properties through a highly interpretable formalism. Crucially, monitoring satisfaction levels of these specifications is differentiable, enabling gradient-based search. At inference time, we optimize directly over the DM latent space to maximize STREL formula satisfaction. The result is efficient generation of highly plausible yet safety-critical multi-agent scenarios that lie within the learned data distribution. STRELGen thus provides a flexible, interpretable, and powerful tool for stress-testing autonomous driving systems, moving beyond the limitations of brute-force data collection.