DeeAD: Dynamic Early Exit of Vision-Language Action for Efficient Autonomous Driving

📅 2025-11-25

📈 Citations: 0

✨ Influential: 0

career value

221K/year

🤖 AI Summary

To address high inference latency in vision-language-action (VLA) models for autonomous driving—caused by deep Transformer architectures—this paper proposes a training-free dynamic early-exit framework. The method employs a lightweight physical feasibility assessment module to evaluate intermediate trajectories in real time against kinematic constraints and safety deviation thresholds, coupled with an action-guided multi-hop adaptive controller that selectively skips redundant transformer layers. Its core innovation lies in the first integration of planning priors (e.g., navigation cues or coarse trajectories) into early-exit decision-making, enabling end-to-end inference acceleration without fine-tuning. Evaluated on the Bench2Drive benchmark, the approach achieves up to 28% transformer layer sparsification and a 29% reduction in end-to-end latency, while strictly preserving trajectory quality and safety compliance.

Technology Category

Application Category

📝 Abstract

Vision-Language Action (VLA) models unify perception, reasoning, and trajectory generation for autonomous driving, but suffer from significant inference latency due to deep transformer stacks. We present DeeAD, a training-free, action-guided early-exit framework that accelerates VLA planning by evaluating the physical feasibility of intermediate trajectories. Instead of relying on confidence scores, DeeAD terminates inference when predicted trajectories align with lightweight planning priors (e.g., Navigation or Low-precision Planning) within a tolerable deviation (<2m). To improve efficiency, we introduce a multi-hop controller that adaptively skips redundant layers based on the change rate of scores. DeeAD integrates into existing VLA models, such as ORION, without requiring retraining. Experiments on the Bench2Drive benchmark demonstrate up to 28% transformer-layer sparsity and 29% latency reduction, while preserving planning quality and safety.

Problem

Research questions and friction points this paper is trying to address.

Reducing inference latency in vision-language action models for autonomous driving

Accelerating trajectory planning by evaluating physical feasibility early

Maintaining planning quality while achieving computational efficiency

Innovation

Methods, ideas, or system contributions that make the work stand out.

Training-free early exit for VLA acceleration

Action-guided feasibility check for trajectory evaluation

Multi-hop controller skipping redundant transformer layers

🔎 Similar Papers

MiniDrive: More Efficient Vision-Language Models with Multi-Level 2D Features as Text Tokens for Autonomous Driving