Nav-EE: Navigation-Guided Early Exiting for Efficient Vision-Language Models in Autonomous Driving

📅 2025-10-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Vision-language models (VLMs) in autonomous driving suffer from high inference latency, while existing early-exit mechanisms exhibit poor generalization across diverse driving scenarios. Method: This paper proposes Nav-EE, a navigation-guided early-exit framework that introduces semantic navigation priors—such as intersections and traffic lights—into exit decision-making for the first time. Nav-EE jointly leverages offline precomputation and online dynamic scheduling to enable task-adaptive layer-wise termination, without requiring task-specific fine-tuning. Contribution/Results: Evaluated on CODA, Waymo, and BOSCH benchmarks, Nav-EE achieves up to 63.9% (average 58.7%) reduction in inference latency with <1.2% accuracy degradation. In real-vehicle deployment, end-to-end latency decreases from 600 ms to 300 ms, significantly enhancing real-time decision-making capability under complex urban conditions.

Technology Category

Application Category

📝 Abstract
Vision-Language Models (VLMs) are increasingly applied in autonomous driving for unified perception and reasoning, but high inference latency hinders real-time deployment. Early-exit reduces latency by terminating inference at intermediate layers, yet its task-dependent nature limits generalization across diverse scenarios. We observe that this limitation aligns with autonomous driving: navigation systems can anticipate upcoming contexts (e.g., intersections, traffic lights), indicating which tasks will be required. We propose Nav-EE, a navigation-guided early-exit framework that precomputes task-specific exit layers offline and dynamically applies them online based on navigation priors. Experiments on CODA, Waymo, and BOSCH show that Nav-EE achieves accuracy comparable to full inference while reducing latency by up to 63.9%. Real-vehicle integration with Autoware Universe further demonstrates reduced inference latency (600ms to 300ms), supporting faster decision-making in complex scenarios. These results suggest that coupling navigation foresight with early-exit offers a viable path toward efficient deployment of large models in autonomous systems. Code and data are available at our anonymous repository: https://anonymous.4open.science/r/Nav-EE-BBC4
Problem

Research questions and friction points this paper is trying to address.

Reducing inference latency in vision-language models for autonomous driving
Improving early-exit generalization across diverse driving scenarios
Dynamically adapting computation using navigation context predictions
Innovation

Methods, ideas, or system contributions that make the work stand out.

Navigation-guided early-exit framework for autonomous driving
Precomputes task-specific exit layers offline using navigation priors
Dynamically applies exit layers online to reduce latency
H
Haibo Hu
Department of Computer Science, City University of Hong Kong
L
Lianming Huang
Department of Computer Science, City University of Hong Kong
X
Xinyu Wang
Department of Computer Science, McGill University
Yufei Cui
Yufei Cui
McGill University, MILA
Medical AIRAGLLM AgentPredictive Uncertainty
Nan Guan
Nan Guan
City University of Hong Kong
Cyber-Physical systemsEmbedded systemsReal-time systems
Chun Jason Xue
Chun Jason Xue
Mohamed bin Zayed University of Artificial Intelligence (MBZUAI)
Systems and Storage