Debiased Fine-Tuning for Vision-language Models by Prompt Regularization

πŸ“… 2023-01-29
πŸ›οΈ AAAI Conference on Artificial Intelligence
πŸ“ˆ Citations: 25
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
To address overfitting and poor out-of-distribution (OOD) generalization in downstream fine-tuning of vision-language models caused by dataset bias, this paper proposes Prompt Regularization (ProReg). ProReg leverages zero-shot prompt predictions from pretrained models as an unbiased knowledge source and introduces a sample-adaptive joint loss combining KL divergence and cross-entropy to dynamically regularize fine-tuningβ€”without adding any trainable parameters. Innovatively integrating class-level prompt engineering (e.g., β€œa photo of a [CLASS]”) with a sample-wise dynamic weighting mechanism, ProReg enables debiased transfer of pretrained knowledge. Extensive experiments on multiple OOD benchmarks demonstrate that ProReg consistently outperforms standard fine-tuning, prompt tuning, and zero-shot prompting, achieving significant gains in generalization performance.
πŸ“ Abstract
We present a new paradigm for fine-tuning large-scale vision-language pre-trained models on downstream task, dubbed Prompt Regularization (ProReg). Different from traditional fine-tuning which easily overfits to the downstream task data, ProReg uses the prediction by prompting the pretrained model to regularize the fine-tuning. The motivation is: by prompting the large model β€œa photo of a [CLASS]”, the fill-in answer is only dependent on the pretraining encyclopedic knowledge while independent of the task data distribution, which is usually biased. Specifically, given a training sample prediction during fine-tuning, we first calculate its Kullback-Leibler loss of the prompt prediction and Cross-Entropy loss of the ground-truth label, and then combine them with a proposed sample-wise adaptive trade- off weight, which automatically adjusts the transfer between the pretrained and downstream domains. On various out-of-distribution benchmarks, we show the consistently strong performance of ProReg compared with conventional fine-tuning, zero-shot prompt, prompt tuning, and other state-of-the-art methods.
Problem

Research questions and friction points this paper is trying to address.

Reduces bias in vision-language model fine-tuning
Prevents overfitting to biased downstream task data
Balances pretrained knowledge and task adaptation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Prompt regularization for fine-tuning vision-language models
Adaptive trade-off weight balancing pretrained and downstream knowledge
KL loss and cross-entropy loss combination for debiasing
πŸ”Ž Similar Papers
No similar papers found.