SafeAuto: Knowledge-Enhanced Safe Autonomous Driving with Multimodal Foundation Models

📅 2025-02-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address safety risks arising from the decoupling of high-level reasoning and low-level control in autonomous driving, this paper proposes an end-to-end driving framework that integrates multimodal large language models (MLLMs) with explicit safety knowledge. The method introduces three key innovations: (1) a position-dependent cross-entropy (PDCE) loss to enhance spatiotemporal precision of control signals; (2) a traffic-rule-driven Markov logic network (MLN)-based verification mechanism for interpretable and regulation-compliant action validation; and (3) a driving-specific multimodal retrieval-augmented generation (RAG) module enabling dynamic injection of safety-critical operational experience. Evaluated on multiple benchmarks, the framework achieves significant improvements over state-of-the-art approaches—demonstrating superior control accuracy, stronger regulatory adherence, and real-time, verifiable safety assurance.

Technology Category

Application Category

📝 Abstract
Traditional autonomous driving systems often struggle to integrate high-level reasoning with low-level control, resulting in suboptimal and sometimes unsafe driving behaviors. The emergence of Multimodal Large Language Models (MLLMs), which can process both visual and textual data, presents an opportunity to unify perception and reasoning tasks within a single framework. However, effectively embedding precise safety knowledge into MLLMs for autonomous driving remains a significant challenge. To address this, we propose SafeAuto, a novel framework that enhances MLLM-based autonomous driving systems by incorporating both unstructured and structured knowledge. Specifically, we first introduce the Position-Dependent Cross-Entropy (PDCE) loss function, designed to improve the accuracy of low-level control signal predictions when numerical values are represented as text. Second, to ensure safe autonomous driving by explicitly integrating precise safety knowledge into the MLLM, we develop a reasoning component for SafeAuto. This component translates driving safety regulations into first-order logic rules (e.g.,"red light =>stop") and incorporates these rules into a probabilistic graphical model, such as a Markov Logic Network (MLN). The MLN is trained to verify the predicted next actions using environmental attributes identified by attribute recognition models (e.g., detecting a red light) to form the predicates. Additionally, we construct a Multimodal RAG model that leverages video data, control signals, and environmental attributes to learn more effectively from past similar driving experiences. By integrating PDCE, MLN, and Multimodal RAG, SafeAuto significantly outperforms existing baselines across multiple datasets. This advancement enables more accurate, reliable, and safer autonomous driving systems that learn from experience, obey traffic laws, and perform precise control actions.
Problem

Research questions and friction points this paper is trying to address.

Integrates high-level reasoning with low-level control in autonomous driving.
Embeds precise safety knowledge into Multimodal Large Language Models.
Enhances autonomous driving accuracy and safety using multimodal data.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Position-Dependent Cross-Entropy loss for control accuracy
Markov Logic Network for safety rule integration
Multimodal RAG model for enhanced learning from experience
🔎 Similar Papers
No similar papers found.
J
Jiawei Zhang
University of Chicago
X
Xuan Yang
Southern University of Science and Technology
T
Taiqi Wang
Nuro
Y
Yu Yao
Nuro
Aleksandr Petiushko
Aleksandr Petiushko
PhD - Moscow State University; Head of AI Research - Elea
discrete mathematicsmachine learningdeep learningcomputer visionAV
B
Bo Li
University of Illinois Urbana-Champaign, Virtue AI