🤖 AI Summary
Image-text foundation models suffer from spurious correlations between inputs and labels, degrading group robustness; existing mitigation methods rely on scarce group annotations. To address this, we propose PPA, a three-stage parameter-efficient fine-tuning framework that requires no group labels. PPA introduces a novel “Project–Probe–Aggregate” paradigm: (1) class-proxy nullspace projection to disentangle bias-confounded features; (2) bias-classifier-guided implicit group inference; and (3) class-wise weight correction via aggregation. We theoretically prove that PPA asymptotically approximates the Bayes-optimal classifier and improves minority-group accuracy. Experiments demonstrate that PPA significantly outperforms state-of-the-art methods in worst-group accuracy—achieving substantial average gains—while introducing fewer than 0.01% additional trainable parameters and eliminating reliance on group annotations entirely.
📝 Abstract
While image-text foundation models have succeeded across diverse downstream tasks, they still face challenges in the presence of spurious correlations between the input and label. To address this issue, we propose a simple three-step approach,Project-Probe-Aggregate (PPA), that enables parameter-efficient fine-tuning for foundation models without relying on group annotations. Building upon the failure-based debiasing scheme, our method, PPA, improves its two key components: minority samples identification and the robust training algorithm. Specifically, we first train biased classifiers by projecting image features onto the nullspace of class proxies from text encoders. Next, we infer group labels using the biased classifier and probe group targets with prior correction. Finally, we aggregate group weights of each class to produce the debiased classifier. Our theoretical analysis shows that our PPA enhances minority group identification and is Bayes optimal for minimizing the balanced group error, mitigating spurious correlations. Extensive experimental results confirm the effectiveness of our PPA: it outperforms the state-of-the-art by an average worst-group accuracy while requiring less than 0.01% tunable parameters without training group labels.