🤖 AI Summary
Current autonomous driving systems often exhibit overly conservative behavior in high-conflict mixed traffic scenarios due to a lack of proactive interaction capabilities, which undermines public acceptance. This work proposes an interactive decision-making framework grounded in large language models (LLMs): it employs an object-process methodology for semantic scene modeling, leverages LLMs to infer both explicit and implicit intentions of traffic participants, and generates and optimizes executable trajectories under safety and efficiency constraints. To enhance transparency, the system broadcasts its decisions in natural language. Experimental results in a multi-agent driving simulator demonstrate that the proposed approach significantly outperforms conventional baselines in terms of safety, comfort, and efficiency. Furthermore, Turing-test-style evaluations confirm that its decisions exhibit a high degree of human-likeness.
📝 Abstract
In high-conflict mixed-traffic scenarios involving human-driven and autonomous vehicles, most existing autonomous driving systems default to overly conservative behaviors, lack proactive interaction, and consequently suffer from limited public acceptance. To mitigate intent misunderstandings and decision failures, we present a Large Language Model based interactive decision-making framework that augments scene understanding and intent-aware interaction to jointly improve safety and efficiency. The approach uses Object-Process Methodology to semantically model complex multi-vehicle scenes, abstracting low-level perceptual data into objects, processes, and relations, thereby streamlining reasoning over latent causal structure. Building on this representation, the Large Language Model parses both explicit and implicit intents of surrounding agents and, under jointly enforced safety and efficiency constraints, selects candidate maneuvers. We further generate perturbed trajectory candidates via Monte Carlo sampling and evaluate them to obtain an optimized executable trajectory. To foster transparency and coordination with nearby road users, the final decision is translated by the Large Language Model into concise natural-language messages and broadcast through an external Human-Machine Interface, completing a closed loop from scene understanding to action to language. Experiments in a cluster driving simulator demonstrate that the proposed method outperforms traditional baselines across safety, comfort, and efficiency metrics, while a Turing-test-style evaluation indicates a high degree of human-likeness in decision making. Besides, these results suggest that coupling semantic scene abstraction with Large Language Model mediated intent reasoning and language-based eHMI communication offers a practical pathway toward interactive, trustworthy autonomous driving in dense mixed traffic.