HCRMP: A LLM-Hinted Contextual Reinforcement Learning Framework for Autonomous Driving

📅 2025-05-21

📈 Citations: 0

✨ Influential: 0

career value

225K/year

🤖 AI Summary

Large language models (LLMs) exhibit severe hallucination in autonomous driving—achieving only 57.95% non-hallucinatory performance on safety-critical tasks—undermining the reliability of reinforcement learning (RL) policies. Method: We propose a novel LLM-guided contextual RL paradigm: leveraging LLM-generated semantic prompts to enrich state representation and policy optimization, while enabling the RL agent to actively suppress erroneous prompts. Contribution/Results: We introduce the first relatively decoupled LLM–RL prompt coordination mechanism, comprising three core modules: semantic caching, contextual stability anchors, and enhanced semantic representation—jointly improving decision robustness without sacrificing semantic guidance. The method integrates multi-critic RL and knowledge-base-driven weight calibration, and is trained end-to-end in CARLA. Experiments demonstrate a maximum task success rate of 80.3%, a 11.4% reduction in collision rate under safety-critical scenarios, and significantly improved driving stability and safety in high-density traffic.

Technology Category

Application Category

📝 Abstract

Integrating Large Language Models (LLMs) with Reinforcement Learning (RL) can enhance autonomous driving (AD) performance in complex scenarios. However, current LLM-Dominated RL methods over-rely on LLM outputs, which are prone to hallucinations.Evaluations show that state-of-the-art LLM indicates a non-hallucination rate of only approximately 57.95% when assessed on essential driving-related tasks. Thus, in these methods, hallucinations from the LLM can directly jeopardize the performance of driving policies. This paper argues that maintaining relative independence between the LLM and the RL is vital for solving the hallucinations problem. Consequently, this paper is devoted to propose a novel LLM-Hinted RL paradigm. The LLM is used to generate semantic hints for state augmentation and policy optimization to assist RL agent in motion planning, while the RL agent counteracts potential erroneous semantic indications through policy learning to achieve excellent driving performance. Based on this paradigm, we propose the HCRMP (LLM-Hinted Contextual Reinforcement Learning Motion Planner) architecture, which is designed that includes Augmented Semantic Representation Module to extend state space. Contextual Stability Anchor Module enhances the reliability of multi-critic weight hints by utilizing information from the knowledge base. Semantic Cache Module is employed to seamlessly integrate LLM low-frequency guidance with RL high-frequency control. Extensive experiments in CARLA validate HCRMP's strong overall driving performance. HCRMP achieves a task success rate of up to 80.3% under diverse driving conditions with different traffic densities. Under safety-critical driving conditions, HCRMP significantly reduces the collision rate by 11.4%, which effectively improves the driving performance in complex scenarios.

Problem

Research questions and friction points this paper is trying to address.

Reduces LLM hallucinations in autonomous driving RL frameworks

Enhances motion planning via semantic hints and policy optimization

Improves driving performance in complex, safety-critical scenarios

Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM generates hints for RL state augmentation

Contextual Stability Anchor enhances hint reliability

Semantic Cache integrates LLM and RL frequencies

🔎 Similar Papers

Human-centric Reward Optimization for Reinforcement Learning-based Automated Driving using Large Language Models