🤖 AI Summary
This work addresses the catastrophic forgetting that arises when fine-tuning vision-language models for autonomous driving, which often leads to the loss of general-purpose knowledge acquired during pretraining. The study presents the first systematic quantitative analysis of this issue and introduces the first forgetting evaluation benchmark tailored to autonomous driving. To mitigate forgetting while adapting to downstream tasks, the authors propose the Drive Expert Adapter framework, which employs a dynamic expert routing mechanism in the prompt space to decouple task adaptation from knowledge retention—without modifying the base model parameters. Experimental results demonstrate that the proposed method achieves state-of-the-art performance on driving-related tasks while significantly alleviating catastrophic forgetting and effectively preserving the model’s generalization capabilities.
📝 Abstract
The integration of Vision-Language Models (VLMs) into autonomous driving promises to solve long-tail scenarios, but this paradigm faces the critical and unaddressed challenge of catastrophic forgetting. The very fine-tuning process used to adapt these models to driving-specific data simultaneously erodes their invaluable pre-trained world knowledge, creating a self-defeating paradox that undermines the core reason for their use. This paper provides the first systematic investigation into this phenomenon. We introduce a new large-scale dataset of 180K scenes, which enables the first-ever benchmark specifically designed to quantify catastrophic forgetting in autonomous driving. Our analysis reveals that existing methods suffer from significant knowledge degradation. To address this, we propose the Drive Expert Adapter (DEA), a novel framework that circumvents this trade-off by shifting adaptation from the weight space to the prompt space. DEA dynamically routes inference through different knowledge experts based on scene-specific cues, enhancing driving-task performance without corrupting the model's foundational parameters. Extensive experiments demonstrate that our approach not only achieves state-of-the-art results on driving tasks but also effectively mitigates catastrophic forgetting, preserving the essential generalization capabilities that make VLMs a transformative force for autonomous systems. Data and model are released at FidelityDrivingBench.