EvoDriveVLA: Evolving Autonomous Driving Vision-Language-Action Model via Collaborative Perception-Planning Distillation

📅 2026-03-10

📈 Citations: 0

✨ Influential: 0

career value

217K/year

🤖 AI Summary

This work addresses the perceptual degradation and cumulative instability in long-horizon planning that arise when visual encoders are unfrozen in vision-language-based autonomous driving models. To mitigate these issues, the authors propose a collaborative perception-planning distillation framework. The approach introduces a self-anchored visual distillation mechanism to enhance robustness in perceiving critical regions and designs a future-aware “oracle” teacher model that leverages trajectory-guided attention and a coarse-to-fine distillation strategy to refine predicted trajectories. Furthermore, Monte Carlo Dropout sampling is integrated to improve uncertainty modeling. Evaluated in open-loop settings, the method achieves state-of-the-art performance and significantly enhances closed-loop driving outcomes.

Technology Category

Application Category

📝 Abstract

Vision-Language-Action models have shown great promise for autonomous driving, yet they suffer from degraded perception after unfreezing the visual encoder and struggle with accumulated instability in long-term planning. To address these challenges, we propose EvoDriveVLA-a novel collaborative perception-planning distillation framework that integrates self-anchored perceptual constraints and oracle-guided trajectory optimization. Specifically, self-anchored visual distillation leverages self-anchor teacher to deliver visual anchoring constraints, regularizing student representations via trajectory-guided key-region awareness. In parallel, oracle-guided trajectory distillation employs a future-aware oracle teacher with coarse-to-fine trajectory refinement and Monte Carlo dropout sampling to produce high-quality trajectory candidates, thereby selecting the optimal trajectory to guide the student's prediction. EvoDriveVLA achieves SOTA performance in open-loop evaluation and significantly enhances performance in closed-loop evaluation. Our code is available at: https://github.com/hey-cjj/EvoDriveVLA.

Problem

Research questions and friction points this paper is trying to address.

Vision-Language-Action

autonomous driving

perception degradation

planning instability

Innovation

Methods, ideas, or system contributions that make the work stand out.

Vision-Language-Action

Collaborative Distillation

Self-Anchored Perception

Oracle-Guided Trajectory