Unveiling the Surprising Efficacy of Navigation Understanding in End-to-End Autonomous Driving

๐Ÿ“… 2026-04-13
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF

career value

212K/year
๐Ÿค– AI Summary
Current end-to-end autonomous driving systems struggle to effectively integrate global navigation information due to their heavy reliance on local perception, leading to inadequate route-following performance in complex scenarios. This work proposes the Sequential Navigation Guidance (SNG) framework, which systematically revealsโ€”for the first timeโ€”the critical role of navigation understanding in end-to-end driving. To support this, the authors introduce the SNG-QA dataset, constructed from real-world navigation trajectories paired with turn-by-turn instructions. Building upon this foundation, they design the multimodal SNG-VLA model, which aligns global navigation intent with local trajectory planning without requiring auxiliary perception-based losses. Experimental results demonstrate that SNG-VLA achieves state-of-the-art performance in both navigation comprehension and trajectory planning, significantly improving navigation accuracy in challenging environments.

Technology Category

Application Category

๐Ÿ“ Abstract
Global navigation information and local scene understanding are two crucial components of autonomous driving systems. However, our experimental results indicate that many end-to-end autonomous driving systems tend to over-rely on local scene understanding while failing to utilize global navigation information. These systems exhibit weak correlation between their planning capabilities and navigation input, and struggle to perform navigation-following in complex scenarios. To overcome this limitation, we propose the Sequential Navigation Guidance (SNG) framework, an efficient representation of global navigation information based on real-world navigation patterns. The SNG encompasses both navigation paths for constraining long-term trajectories and turn-by-turn (TBT) information for real-time decision-making logic. We constructed the SNG-QA dataset, a visual question answering (VQA) dataset based on SNG that aligns global and local planning. Additionally, we introduce an efficient model SNG-VLA that fuses local planning with global planning. The SNG-VLA achieves state-of-the-art performance through precise navigation information modeling without requiring auxiliary loss functions from perception tasks. Project page: SNG-VLA
Problem

Research questions and friction points this paper is trying to address.

autonomous driving
navigation understanding
end-to-end learning
global navigation
local scene understanding
Innovation

Methods, ideas, or system contributions that make the work stand out.

Sequential Navigation Guidance
End-to-End Autonomous Driving
Navigation Understanding
SNG-VLA
Visual Question Answering
๐Ÿ”Ž Similar Papers
No similar papers found.
Z
Zhihua Hua
College of Intelligent Robotics and Advanced Manufacturing, Fudan University
Junli Wang
Junli Wang
Tsinghua University
Natural Language Processing
Pengfei Li
Pengfei Li
Institute of intelligent industry, Tsinghua University
Embodied AIAutonomous DrivingComputer Vision
Q
Qihao Jin
College of Intelligent Robotics and Advanced Manufacturing, Fudan University; Institute for AI Industry Research (AIR), Tsinghua University
Bo Zhang
Bo Zhang
Meituan
MLLMModel CompressionAutoMLComputer Vision
K
Kehua Sheng
Didi Chuxing
Y
Yilun Chen
Institute for AI Industry Research (AIR), Tsinghua University
Z
Zhongxue Gan
College of Intelligent Robotics and Advanced Manufacturing, Fudan University
Wenchao Ding
Wenchao Ding
Tenure-track Associate Professor, Fudan University
RoboticsMotion PlanningAutonomous NavigationDecision Making