🤖 AI Summary
Autonomous driving faces challenges in generating complex, rare scenarios and achieving high fidelity for safety-critical cases. This paper systematically surveys foundational models—including large language models (LLMs), vision-language models (VLMs), multimodal LLMs, diffusion models, and world models—for scenario generation and analysis. We propose, for the first time, a unified taxonomy tailored to autonomous driving. Our structured framework encompasses methods, datasets, simulation platforms, and evaluation metrics; we introduce novel domain-specific metrics and two analytical dimensions—causal fidelity and safety-critical fidelity. We publicly release an actively maintained literature repository and supplementary materials. Synthesizing over 100 state-of-the-art works, we catalog major open-source resources and explicitly identify key bottlenecks (e.g., insufficient causal modeling, distortion of safety-critical scenarios) alongside promising future directions. This work provides a systematic foundation for enhancing both diversity and realism in autonomous driving scenario generation.
📝 Abstract
For autonomous vehicles, safe navigation in complex environments depends on handling a broad range of diverse and rare driving scenarios. Simulation- and scenario-based testing have emerged as key approaches to development and validation of autonomous driving systems. Traditional scenario generation relies on rule-based systems, knowledge-driven models, and data-driven synthesis, often producing limited diversity and unrealistic safety-critical cases. With the emergence of foundation models, which represent a new generation of pre-trained, general-purpose AI models, developers can process heterogeneous inputs (e.g., natural language, sensor data, HD maps, and control actions), enabling the synthesis and interpretation of complex driving scenarios. In this paper, we conduct a survey about the application of foundation models for scenario generation and scenario analysis in autonomous driving (as of May 2025). Our survey presents a unified taxonomy that includes large language models, vision-language models, multimodal large language models, diffusion models, and world models for the generation and analysis of autonomous driving scenarios. In addition, we review the methodologies, open-source datasets, simulation platforms, and benchmark challenges, and we examine the evaluation metrics tailored explicitly to scenario generation and analysis. Finally, the survey concludes by highlighting the open challenges and research questions, and outlining promising future research directions. All reviewed papers are listed in a continuously maintained repository, which contains supplementary materials and is available at https://github.com/TUM-AVS/FM-for-Scenario-Generation-Analysis.