🤖 AI Summary
Vision-language models such as CLIP exhibit stereotypical biases along social attributes (e.g., gender, age), undermining their neutrality and fairness. To address this, we propose a fully unsupervised, fine-grained debiasing method that requires no attribute annotations: leveraging only attribute-neutral text prompts, our approach applies a neutralizing projection in the text feature space to explicitly remove sensitive attribute information. Capitalizing on CLIP’s dual-encoder architecture, we design a lightweight neutral filter module that preserves rich semantic expressiveness while precisely attenuating bias. Extensive experiments demonstrate that our method significantly outperforms existing adversarial training and test-time projection baselines across multiple social bias evaluation benchmarks. Crucially, it achieves superior bias mitigation without compromising downstream task performance—marking the first approach to enable unsupervised, fine-grained, and semantics-preserving debiasing of social attributes in vision-language models.
📝 Abstract
Large-scale vision-language models, such as CLIP, are known to contain harmful societal bias regarding protected attributes (e.g., gender and age). In this paper, we aim to address the problems of societal bias in CLIP. Although previous studies have proposed to debias societal bias through adversarial learning or test-time projecting, our comprehensive study of these works identifies two critical limitations: 1) loss of attribute information when it is explicitly disclosed in the input and 2) use of the attribute annotations during debiasing process. To mitigate societal bias in CLIP and overcome these limitations simultaneously, we introduce a simple-yet-effective debiasing method called SANER (societal attribute neutralizer) that eliminates attribute information from CLIP text features only of attribute-neutral descriptions. Experimental results show that SANER, which does not require attribute annotations and preserves original information for attribute-specific descriptions, demonstrates superior debiasing ability than the existing methods.