Evolving Prompt Adaptation for Vision-Language Models

πŸ“… 2026-03-10
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF

career value

192K/year
πŸ€– AI Summary
This work addresses the challenge of catastrophic forgetting in large vision-language models during few-shot fine-tuning, which often compromises pre-trained knowledge. To mitigate this issue, the authors propose EvoPrompt, a parameter-efficient framework that enables forgetting-free adaptation by decoupling the direction and magnitude of prompt evolution. EvoPrompt constructs a unified embedding space to generate cross-modal hierarchical prompts and incorporates three key components: a modality-shared prompt projector (MPP), an evolutionary training strategy, and feature geometry regularization (FGR). These mechanisms collectively prevent representation collapse and preserve semantic knowledge encoded during pre-training. Experimental results demonstrate that EvoPrompt achieves state-of-the-art performance on few-shot tasks while substantially retaining the model’s zero-shot generalization capabilities.

Technology Category

Application Category

πŸ“ Abstract
The adaptation of large-scale vision-language models (VLMs) to downstream tasks with limited labeled data remains a significant challenge. While parameter-efficient prompt learning methods offer a promising path, they often suffer from catastrophic forgetting of pre-trained knowledge. Toward addressing this limitation, our work is grounded in the insight that governing the evolutionary path of prompts is essential for forgetting-free adaptation. To this end, we propose EvoPrompt, a novel framework designed to explicitly steer the prompt trajectory for stable, knowledge-preserving fine-tuning. Specifically, our approach employs a Modality-Shared Prompt Projector (MPP) to generate hierarchical prompts from a unified embedding space. Critically, an evolutionary training strategy decouples low-rank updates into directional and magnitude components, preserving early-learned semantic directions while only adapting their magnitude, thus enabling prompts to evolve without discarding foundational knowledge. This process is further stabilized by Feature Geometric Regularization (FGR), which enforces feature decorrelation to prevent representation collapse. Extensive experiments demonstrate that EvoPrompt achieves state-of-the-art performance in few-shot learning while robustly preserving the original zero-shot capabilities of pre-trained VLMs.
Problem

Research questions and friction points this paper is trying to address.

vision-language models
catastrophic forgetting
prompt adaptation
few-shot learning
pre-trained knowledge preservation
Innovation

Methods, ideas, or system contributions that make the work stand out.

EvoPrompt
prompt evolution
modality-shared prompt projector
feature geometric regularization
catastrophic forgetting
πŸ”Ž Similar Papers
No similar papers found.