🤖 AI Summary
Existing methods for generating safety-critical scenarios in autonomous driving often struggle to balance physical feasibility with system challenge, frequently yielding infeasible or controller-dependent test cases. This work proposes a feasibility-guided, boundary-driven scenario generation framework that focuses on the “boundary band”—scenarios that are physically realizable yet sufficient to induce system failure. By formulating constrained multi-objective reinforcement learning, the approach jointly optimizes a Responsibility-Sensitive Safety (RSS)-based physical feasibility score and an online risk predictor, augmented with a step-level feasibility-aware masking mechanism to ensure controllable and highly challenging scenario synthesis. Evaluated on SafeBench, the method achieves a 6.2 percentage point increase in collision rate while preserving physical validity; furthermore, adversarial fine-tuning using the generated scenarios effectively reduces real-world collision rates in downstream deployment.
📝 Abstract
Safety-critical scenarios are central to evaluating autonomous driving systems, yet their rarity in naturalistic logs makes simulation-based stress testing indispensable. Most scenario generation methods treat surrounding agents as adversaries, but they either (i) induce failures without explicitly modeling vehicle-road physical limits, yielding visually extreme yet physically unsolvable crashes, or (ii) enforce physical feasibility or policy feasibility in isolation, which can over-focus on aggressive maneuvers or remain tied to a controller-dependent capability boundary. We propose ScenePilot, a feasibility-guided, boundary-driven framework that targets the boundary band: scenarios that are physically solvable in principle yet still cause the deployed autonomy stack to fail. We formulate generation as constrained multi-objective reinforcement learning, combining an RSS-derived physical-feasibility score $σ$ with an online-learned AV-risk predictor $Φ$, and introduce step-level feasibility-aware shielding to keep exploration near the feasibility boundary while avoiding infeasible artifacts. Experiments on SafeBench with multiple planners show that ScenePilot yields substantially higher collision rates (+6.2 percentage points) while preserving physical validity, and that adversarial fine-tuning on these boundary-band scenarios consistently reduces downstream crash rates. The code is available at https://github.com/QiyuRuan/ScenePilot.