🤖 AI Summary
This work addresses the significant performance degradation of vision-language models under distribution shifts. Existing prompt-based test-time adaptation methods rely on entropy minimization, which often amplifies spurious correlations and leads to overconfident errors. To mitigate feature bias induced by shared visual evidence, the authors propose Fair Contextual Learning (FCL), a novel framework that incorporates fairness constraints—marking the first such application in this setting. Built upon an additive evidence decomposition assumption, FCL decouples adaptation into two stages: data augmentation–driven category exploration and fairness-guided textual context calibration, thereby entirely avoiding entropy minimization. Experiments demonstrate that FCL achieves performance on par with state-of-the-art test-time adaptation methods across diverse domain-shift and fine-grained benchmarks, confirming both its empirical effectiveness and theoretical merits.
📝 Abstract
Vision-Language Models (VLMs) such as CLIP enable strong zero-shot recognition but suffer substantial degradation under distribution shifts. Test-Time Adaptation (TTA) aims to improve robustness using only unlabeled test samples, yet most prompt-based TTA methods rely on entropy minimization -- an approach that can amplify spurious correlations and induce overconfident errors when classes share visual features. We propose Fair Context Learning (FCL), an episodic TTA framework that avoids entropy minimization by explicitly addressing shared-evidence bias. Motivated by our additive evidence decomposition assumption, FCL decouples adaptation into (i) augmentation-based exploration to identify plausible class candidates, and (ii) fairness-driven calibration that adapts text contexts to equalize sensitivity to common visual evidence. This fairness constraint mitigates partial feature obsession and enables effective calibration of text embeddings without relying on entropy reduction. Through extensive evaluation, we empirically validate our theoretical motivation and show that FCL achieves competitive adaptation performance relative to state-of-the-art TTA methods across diverse domain-shift and fine-grained benchmarks.