🤖 AI Summary
To address insufficient diversity and limited coverage of critical scenarios in natural language–driven traffic scene generation for autonomous driving simulation, this paper proposes the first large language model (LLM)–driven end-to-end text-to-scene generation framework. The method integrates semantic parsing, vector-based retrieval, multi-factor road ranking, and joint planning of dynamic road networks and agent behaviors—thereby overcoming reliance on predefined trajectories. It enables semantically controllable generation of both routine and high-risk driving scenarios and seamlessly interfaces with the CARLA simulator. Evaluated on the SafeBench benchmark, the framework reduces the average collision rate from 8.0% to 3.5%, while significantly improving narrative coherence and causal reasoning in scene descriptions. This work establishes a scalable, interpretable paradigm for safety-critical scenario generation in autonomous driving validation.
📝 Abstract
Text-to-scene generation typically limits environmental diversity by generating key scenarios along predetermined paths. To address these constraints, we propose a novel text-to-traffic scene framework that leverages a large language model (LLM) to autonomously generate diverse traffic scenarios for the CARLA simulator based on natural language descriptions. Our pipeline comprises several key stages: (1) Prompt Analysis, where natural language inputs are decomposed; (2) Road Retrieval, selecting optimal roads from a database; (3) Agent Planning, detailing agent types and behaviors; (4) Road Ranking, scoring roads to match scenario requirements; and (5) Scene Generation, rendering the planned scenarios in the simulator. This framework supports both routine and critical traffic scenarios, enhancing its applicability. We demonstrate that our approach not only diversifies agent planning and road selection but also significantly reduces the average collision rate from 8% to 3.5% in SafeBench. Additionally, our framework improves narration and reasoning for driving captioning tasks. Our contributions and resources are publicly available at https://basiclab.github.io/TTSG.