Senna-2: Aligning VLM and End-to-End Driving Policy for Consistent Decision Making and Planning

📅 2026-03-11

📈 Citations: 0

✨ Influential: 0

career value

201K/year

🤖 AI Summary

This work addresses the inconsistency between high-level intent from vision-language models (VLMs) and low-level actions in end-to-end driving policies, which often leads to trajectory deviations from intended goals. To resolve this, we propose Senna-2, introducing a novel consistency-oriented three-stage training paradigm: first, domain-specific driving pretraining; second, open-loop alignment between the VLM and policy outputs; and third, closed-loop alignment via hierarchical reinforcement learning within a 3D Gaussian Splatting (3DGS) simulation environment. This approach substantially enhances decision-execution consistency, achieving a 19.3% improvement in dual-system consistency F1 score, a 5.7% reduction in open-loop final displacement error (FDE), and a 30.6% decrease in closed-loop accident frequency rate (AF-CR).

Technology Category

Application Category

📝 Abstract

Vision-language models (VLMs) enhance the planning capability of end-to-end (E2E) driving policy by leveraging high-level semantic reasoning. However, existing approaches often overlook the dual-system consistency between VLM's high-level decision and E2E's low-level planning. As a result, the generated trajectories may misalign with the intended driving decisions, leading to weakened top-down guidance and decision-following ability of the system. To address this issue, we propose Senna-2, an advanced VLM-E2E driving policy that explicitly aligns the two systems for consistent decision-making and planning. Our method follows a consistency-oriented three-stage training paradigm. In the first stage, we conduct driving pre-training to achieve preliminary decision-making and planning, with a decision adapter transmitting VLM decisions to E2E policy in the form of implicit embeddings. In the second stage, we align the VLM and the E2E policy in an open-loop setting. In the third stage, we perform closed-loop alignment via bottom-up Hierarchical Reinforcement Learning in 3DGS environments to reinforce the safety and efficiency. Extensive experiments demonstrate that Senna-2 achieves superior dual-system consistency (19.3% F1 score improvement) and significantly enhances driving safety in both open-loop (5.7% FDE reduction) and closed-loop settings (30.6% AF-CR reduction).

Problem

Research questions and friction points this paper is trying to address.

vision-language models

end-to-end driving policy

dual-system consistency

decision-making

trajectory planning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Vision-Language Models

End-to-End Driving Policy

Dual-System Consistency