LLaViDA: A Large Language Vision Driving Assistant for Explicit Reasoning and Enhanced Trajectory Planning

📅 2025-12-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the weak generalization and poor few-shot adaptability of end-to-end trajectory planning under adverse weather, complex road networks, and uncertain human behavior, this paper proposes the first driving assistant framework integrating vision-language understanding with explicit symbolic reasoning. Our core contribution is Trajectory Preference Optimization (TPO), a novel method that couples chain-of-thought reasoning with semantic motion prediction to enable interpretable, few-shot generalizable, vision-language model (VLM)-driven planning. TPO employs a two-stage training paradigm—supervised fine-tuning followed by preference alignment optimization—to jointly model multimodal perception, object motion regression, and logic-constrained reasoning. Evaluated on nuScenes, our approach achieves a mean L2 trajectory error of 0.31 m and a collision rate of 0.10%, significantly outperforming state-of-the-art end-to-end, VLM-based, and LLM-based baselines.

Technology Category

Application Category

📝 Abstract
Trajectory planning is a fundamental yet challenging component of autonomous driving. End-to-end planners frequently falter under adverse weather, unpredictable human behavior, or complex road layouts, primarily because they lack strong generalization or few-shot capabilities beyond their training data. We propose LLaViDA, a Large Language Vision Driving Assistant that leverages a Vision-Language Model (VLM) for object motion prediction, semantic grounding, and chain-of-thought reasoning for trajectory planning in autonomous driving. A two-stage training pipeline--supervised fine-tuning followed by Trajectory Preference Optimization (TPO)--enhances scene understanding and trajectory planning by injecting regression-based supervision, produces a powerful "VLM Trajectory Planner for Autonomous Driving." On the NuScenes benchmark, LLaViDA surpasses state-of-the-art end-to-end and other recent VLM/LLM-based baselines in open-loop trajectory planning task, achieving an average L2 trajectory error of 0.31 m and a collision rate of 0.10% on the NuScenes test set. The code for this paper is available at GitHub.
Problem

Research questions and friction points this paper is trying to address.

Improves trajectory planning in autonomous driving under adverse conditions
Enhances generalization and few-shot capabilities beyond training data
Reduces trajectory error and collision rates in complex scenarios
Innovation

Methods, ideas, or system contributions that make the work stand out.

Vision-Language Model for motion prediction and reasoning
Two-stage training with supervised fine-tuning and Trajectory Preference Optimization
Chain-of-thought reasoning for enhanced trajectory planning
🔎 Similar Papers
No similar papers found.
Y
Yudong Liu
Duke University
S
Spencer Hallyburton
Duke University
Jiwoo Kim
Jiwoo Kim
성균관대학교 인공지능학과
Yueqian Lin
Yueqian Lin
PhD Student, Duke University
Y
Yiming Li
Duke University
Qinsi Wang
Qinsi Wang
Duke University
Efficiency LLMModel Accelerate
H
Hui Ye
Georgia State University
J
Jingwei Sun
University of Florida
M
Miroslav Pajic
Duke University
Y
Yiran Chen
Duke University
H
Hai Li
Duke University