AD-EE: Early Exiting for Fast and Reliable Vision-Language Models in Autonomous Driving

📅 2025-06-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address excessive reasoning-induced latency and computational overhead in vision-language models (VLMs) for autonomous driving, this paper proposes the first early-exit framework integrating causal inference with driving-domain priors. Unlike conventional heuristic exit strategies, our method constructs a hierarchical causal graph to model dependencies among reasoning steps, and jointly employs dynamic confidence estimation and domain-adaptive exit decision-making to terminate redundant computation once sufficient semantic evidence is attained. Evaluated on Waymo and CODA benchmarks, the framework reduces inference latency by up to 57.58% while improving object detection mAP by 44%. Real-world deployment on the Autoware Universe platform confirms consistent low-latency performance (average reduction of 51.2%) and enhanced robustness. The core contribution lies in pioneering the integration of causal reasoning into VLM early-exit decisions—enabling task-aware, interpretable, and adaptive efficient inference for autonomous driving.

Technology Category

Application Category

📝 Abstract
With the rapid advancement of autonomous driving, deploying Vision-Language Models (VLMs) to enhance perception and decision-making has become increasingly common. However, the real-time application of VLMs is hindered by high latency and computational overhead, limiting their effectiveness in time-critical driving scenarios. This challenge is particularly evident when VLMs exhibit over-inference, continuing to process unnecessary layers even after confident predictions have been reached. To address this inefficiency, we propose AD-EE, an Early Exit framework that incorporates domain characteristics of autonomous driving and leverages causal inference to identify optimal exit layers. We evaluate our method on large-scale real-world autonomous driving datasets, including Waymo and the corner-case-focused CODA, as well as on a real vehicle running the Autoware Universe platform. Extensive experiments across multiple VLMs show that our method significantly reduces latency, with maximum improvements reaching up to 57.58%, and enhances object detection accuracy, with maximum gains of up to 44%.
Problem

Research questions and friction points this paper is trying to address.

Reducing high latency in Vision-Language Models for autonomous driving
Minimizing computational overhead in real-time VLM applications
Addressing over-inference by optimizing early exit layers
Innovation

Methods, ideas, or system contributions that make the work stand out.

Early Exit framework for autonomous driving VLMs
Leverages causal inference for optimal exit layers
Reduces latency and enhances detection accuracy
🔎 Similar Papers
No similar papers found.