GroupCoOp: Group-robust Fine-tuning via Group Prompt Learning

πŸ“… 2025-09-28
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Parameter-efficient fine-tuning (PEFT) of vision-language models (VLMs) suffers from spurious correlations and degraded group robustness due to subgroup data imbalance. Method: We propose Group-aware Prompt Learning (GPL), a novel PEFT framework that introduces subgroup-specific textual prompts as lightweight multi-classifiers. By leveraging the strong semantic generalization capacity of the text encoder, GPL mitigates bias in underrepresented subgroups and optimizes class embedding distributions. Our approach fine-tunes only 0.016% of model parameters and incorporates context-aware prompt optimization. Contribution/Results: GPL achieves significant improvements in group robustness across five CLIP-based benchmarks, outperforming several full fine-tuning methods. It establishes a new paradigm for efficient and fair VLM adaptation, balancing parameter efficiency, robustness, and fairness without architectural modification.

Technology Category

Application Category

πŸ“ Abstract
Parameter-efficient fine-tuning (PEFT) of vision-language models (VLMs) excels in various vision tasks thanks to the rich knowledge and generalization ability of VLMs. However, recent studies revealed that such fine-tuned VLMs are vulnerable to spurious correlations stemming from the subgroup imbalance in the fine-tuning datasets. To resolve this issue, we propose Group Context Optimization (GroupCoOp), a simple and effective debiased fine-tuning algorithm that enhances the group robustness of fine-tuned VLMs. Its key idea is to employ group-specific text prompts as group representatives serving as multiple classifiers for their target class. The rich semantic knowledge of the text encoder of VLM enables the discovery of effective group prompts even for groups with a small number of training samples. Leveraging the group prompts for each class addresses the issues caused by the group-imbalanced training set, such as the neglect of minority groups and the scattered distribution of each class in the embedding space. GroupCoOp achieved the best results on five benchmarks across five CLIP architectures and occasionally outperformed prior methods that fine-tune the entire network, despite training only 0.016% of the network's parameters.
Problem

Research questions and friction points this paper is trying to address.

Addresses vulnerability to spurious correlations in fine-tuned vision-language models
Enhances group robustness by using group-specific prompts as classifiers
Mitigates issues from group-imbalanced training sets like minority neglect
Innovation

Methods, ideas, or system contributions that make the work stand out.

Group-specific prompts as multiple classifiers
Text encoder discovers prompts for small groups
Group prompts address class imbalance issues
πŸ”Ž Similar Papers
No similar papers found.