An LLM-driven Scenario Generation Pipeline Using an Extended Scenic DSL for Autonomous Driving Safety Validation

📅 2026-02-24

📈 Citations: 0

✨ Influential: 0

career value

184K/year

🤖 AI Summary

This work addresses the challenge of efficiently translating real-world multimodal accident reports—comprising textual descriptions and hand-drawn sketches—into high-fidelity, executable simulation scenarios for autonomous driving safety validation, a bottleneck that hinders scalable testing. The authors propose a framework leveraging a large language model (GPT-4o mini) and an extended version of the Scenic domain-specific language, introducing a probabilistic intermediate representation that decouples high-level semantic understanding from low-level scene rendering. This enables automated extraction of semantic scene configurations and generation of simulatable test cases. Evaluated on the NHTSA CIREN dataset, the approach achieves 100% accuracy in reconstructing environmental and road network attributes and 97%–98% accuracy in trajectory extraction. Furthermore, across 2,000 generated scenario variants, the method consistently triggers the intended traffic violations, demonstrating robustness, effectiveness, and scalability.

Technology Category

Application Category

📝 Abstract

Real-world crash reports, which combine textual summaries and sketches, are valuable for scenario-based testing of autonomous driving systems (ADS). However, current methods cannot effectively translate this multimodal data into precise, executable simulation scenarios, hindering the scalability of ADS safety validation. In this work, we propose a scalable and verifiable pipeline that uses a large language model (GPT-4o mini) and a probabilistic intermediate representation (an Extended Scenic domain-specific language) to automatically extract semantic scenario configurations from crash reports and generate corresponding simulation-ready scenarios. Unlike earlier approaches such as ScenicNL and LCTGen (which generate scenarios directly from text) or TARGET (which uses deterministic mappings from traffic rules), our method introduces an intermediate Scenic DSL layer to separate high-level semantic understanding from low-level scenario rendering, reducing errors and capturing real-world variability. We evaluated the pipeline on cases from the NHTSA CIREN database. The results show high accuracy in knowledge extraction: 100% correctness for environmental and road network attributes, and 97% and 98% for oracle and actor trajectories, respectively, compared to human-derived ground truth. We executed the generated scenarios in the CARLA simulator using the Autoware driving stack, and they consistently triggered the intended traffic-rule violations (such as opposite-lane crossing and red-light running) across 2,000 scenario variations. These findings demonstrate that the proposed pipeline provides a legally grounded, scalable, and verifiable approach to ADS safety validation.

Problem

Research questions and friction points this paper is trying to address.

autonomous driving safety validation

scenario generation

crash report

multimodal data

simulation scenarios

Innovation

Methods, ideas, or system contributions that make the work stand out.

Large Language Model

Scenario Generation

Extended Scenic DSL