GroupCoOp: Group-robust Fine-tuning via Group Prompt Learning

📅 2025-09-28

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Parameter-efficient fine-tuning (PEFT) of vision-language models (VLMs) suffers from spurious correlations and degraded group robustness due to subgroup data imbalance. Method: We propose Group-aware Prompt Learning (GPL), a novel PEFT framework that introduces subgroup-specific textual prompts as lightweight multi-classifiers. By leveraging the strong semantic generalization capacity of the text encoder, GPL mitigates bias in underrepresented subgroups and optimizes class embedding distributions. Our approach fine-tunes only 0.016% of model parameters and incorporates context-aware prompt optimization. Contribution/Results: GPL achieves significant improvements in group robustness across five CLIP-based benchmarks, outperforming several full fine-tuning methods. It establishes a new paradigm for efficient and fair VLM adaptation, balancing parameter efficiency, robustness, and fairness without architectural modification.

Technology Category

Application Category

📝 Abstract

Parameter-efficient fine-tuning (PEFT) of vision-language models (VLMs) excels in various vision tasks thanks to the rich knowledge and generalization ability of VLMs. However, recent studies revealed that such fine-tuned VLMs are vulnerable to spurious correlations stemming from the subgroup imbalance in the fine-tuning datasets. To resolve this issue, we propose Group Context Optimization (GroupCoOp), a simple and effective debiased fine-tuning algorithm that enhances the group robustness of fine-tuned VLMs. Its key idea is to employ group-specific text prompts as group representatives serving as multiple classifiers for their target class. The rich semantic knowledge of the text encoder of VLM enables the discovery of effective group prompts even for groups with a small number of training samples. Leveraging the group prompts for each class addresses the issues caused by the group-imbalanced training set, such as the neglect of minority groups and the scattered distribution of each class in the embedding space. GroupCoOp achieved the best results on five benchmarks across five CLIP architectures and occasionally outperformed prior methods that fine-tune the entire network, despite training only 0.016% of the network's parameters.

Problem

Research questions and friction points this paper is trying to address.

Addresses vulnerability to spurious correlations in fine-tuned vision-language models

Enhances group robustness by using group-specific prompts as classifiers

Mitigates issues from group-imbalanced training sets like minority neglect

Innovation

Methods, ideas, or system contributions that make the work stand out.

Group-specific prompts as multiple classifiers

Text encoder discovers prompts for small groups

Group prompts address class imbalance issues

🔎 Similar Papers

No similar papers found.

Authors to Follow