Traffic Scene Generation from Natural Language Description for Autonomous Vehicles with Large Language Model

📅 2024-09-15

🏛️ arXiv.org

📈 Citations: 1

✨ Influential: 0

career value

189K/year

🤖 AI Summary

To address insufficient diversity and limited coverage of critical scenarios in natural language–driven traffic scene generation for autonomous driving simulation, this paper proposes the first large language model (LLM)–driven end-to-end text-to-scene generation framework. The method integrates semantic parsing, vector-based retrieval, multi-factor road ranking, and joint planning of dynamic road networks and agent behaviors—thereby overcoming reliance on predefined trajectories. It enables semantically controllable generation of both routine and high-risk driving scenarios and seamlessly interfaces with the CARLA simulator. Evaluated on the SafeBench benchmark, the framework reduces the average collision rate from 8.0% to 3.5%, while significantly improving narrative coherence and causal reasoning in scene descriptions. This work establishes a scalable, interpretable paradigm for safety-critical scenario generation in autonomous driving validation.

Technology Category

Application Category

📝 Abstract

Text-to-scene generation typically limits environmental diversity by generating key scenarios along predetermined paths. To address these constraints, we propose a novel text-to-traffic scene framework that leverages a large language model (LLM) to autonomously generate diverse traffic scenarios for the CARLA simulator based on natural language descriptions. Our pipeline comprises several key stages: (1) Prompt Analysis, where natural language inputs are decomposed; (2) Road Retrieval, selecting optimal roads from a database; (3) Agent Planning, detailing agent types and behaviors; (4) Road Ranking, scoring roads to match scenario requirements; and (5) Scene Generation, rendering the planned scenarios in the simulator. This framework supports both routine and critical traffic scenarios, enhancing its applicability. We demonstrate that our approach not only diversifies agent planning and road selection but also significantly reduces the average collision rate from 8% to 3.5% in SafeBench. Additionally, our framework improves narration and reasoning for driving captioning tasks. Our contributions and resources are publicly available at https://basiclab.github.io/TTSG.

Problem

Research questions and friction points this paper is trying to address.

Generate diverse traffic scenarios

Reduce collision rate in simulations

Enhance driving captioning narration

Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM for traffic scene generation

Dynamic road and agent planning

Enhanced safety and scenario diversity

🔎 Similar Papers

Query3D: LLM-Powered Open-Vocabulary Scene Segmentation with Language Embedded 3D Gaussian