Prompt-aligned Gradient for Prompt Tuning

📅 2022-05-30
🏛️ IEEE International Conference on Computer Vision
📈 Citations: 320
Influential: 44
📄 PDF
🤖 AI Summary
Soft prompt tuning often suffers from catastrophic forgetting of general-purpose knowledge in vision-language models (e.g., CLIP) under few-shot settings, leading to performance worse than zero-shot inference. Method: We propose Gradient Alignment (GA), a novel optimization mechanism that constrains prompt gradient updates to align with the direction of zero-shot predictions derived from predefined prompts—thereby explicitly preserving task-agnostic, pre-trained knowledge without requiring additional data, regularization, or architectural modifications. Contribution/Results: GA effectively mitigates overfitting and inter-class interference. It consistently outperforms state-of-the-art prompt-tuning methods across diverse transfer scenarios—including few-shot learning, domain generalization, base-to-novel class adaptation, and cross-dataset transfer—delivering substantial improvements in both generalization stability and accuracy.
📝 Abstract
Thanks to the large pre-trained vision-language models (VLMs) like CLIP [37], we can craft a zero-shot classifier by discrete prompt design, e.g., the confidence score of an image being "[CLASS]" can be obtained by using the VLM provided similarity between the image and the prompt sentence "a photo of a [CLASS]". Furthermore, prompting shows great potential for fast adaptation of VLMs to downstream tasks if we fine-tune the soft prompts with few samples. However, we find a common failure that improper fine-tuning or learning with extremely few-shot samples may even under-perform the zero-shot prediction. Existing methods still address this problem by using traditional anti-overfitting techniques such as early stopping and data augmentation, which lack a principled solution specific to prompting. In this paper, we present Prompt-aligned Gradient, dubbed ProGrad to prevent prompt tuning from forgetting the general knowledge learned from VLMs. In particular, ProGrad only updates the prompt whose gradient is aligned (or non-conflicting) to the general knowledge, which is represented as the optimization direction offered by the pre-defined prompt predictions. Extensive experiments under the few-shot learning, domain generalization, base-to-new generalization and cross-dataset transfer settings demonstrate the stronger few-shot generalization ability of ProGrad over state-of-the-art prompt tuning methods.
Problem

Research questions and friction points this paper is trying to address.

Prevents forgetting general knowledge during prompt tuning
Addresses improper fine-tuning undermining prompt prediction accuracy
Aligns gradient updates to maintain VLM generalization capability
Innovation

Methods, ideas, or system contributions that make the work stand out.

ProGrad aligns prompt gradients with general knowledge direction
Updates only non-conflicting gradients to prevent VLM forgetting
Uses KL loss gradient as reference for alignment guidance
🔎 Similar Papers
No similar papers found.