Leveraging vision-language models for fair facial attribute classification

📅 2024-03-15

📈 Citations: 1

✨ Influential: 0

career value

189K/year

🤖 AI Summary

Deep learning models for facial attribute classification often exhibit inter-group performance disparities due to data imbalance or spurious correlations, while existing unsupervised fairness methods lack explicit optimization of opportunity-equality–based metrics. To address this, we propose the first sensitive-label-free, fully unsupervised fairness correction paradigm. Leveraging vision-language models (VLMs) such as CLIP, our method implicitly models sensitive attribute distributions via textual prompts to identify bias-prone samples. We then introduce a VLM-confidence-guided resampling and semantic augmentation strategy to enhance downstream classifier fairness without access to sensitive labels. Extensive experiments across multiple benchmarks demonstrate that our approach significantly reduces Equalized Odds disparity—by up to 38%—and consistently outperforms state-of-the-art unsupervised fairness methods in both fairness and accuracy.

Technology Category

Application Category

📝 Abstract

Performance disparities of image recognition across different demographic populations are known to exist in deep learning-based models, but previous work has largely addressed such fairness problems assuming knowledge of sensitive attribute labels. To overcome this reliance, previous strategies have involved separate learning structures to expose and adjust for disparities. In this work, we explore a new paradigm that does not require sensitive attribute labels, and evades the need for extra training by leveraging general-purpose vision-language model (VLM), as a rich knowledge source for common sensitive attributes. We analyze the correspondence between VLM predicted and human defined sensitive attribute distribution. We find that VLMs can recognize samples with clear attribute information encoded in image representations, thus capture under-performed samples conflicting with attribute-related bias. We train downstream target classifiers by re-sampling and augmenting under-performed attribute groups. Extensive experiments on multiple benchmark facial attribute classification datasets show fairness gains of the model over existing unsupervised baselines that tackle with arbitrary bias. The work indicates that vision-language models can extract discriminative sensitive information prompted by language, and be used to promote model fairness.

Problem

Research questions and friction points this paper is trying to address.

Optimizes parity-based fairness metrics without group labels

Addresses performance disparities across demographic groups in image recognition

Enables flexible fairness criteria optimization via vision-language models

Innovation

Methods, ideas, or system contributions that make the work stand out.

Directly optimizes parity-based fairness metrics

Leverages vision-language models for attribute analysis

Formulates loss functions for flexible fairness criteria

🔎 Similar Papers

Social Perception of Faces in a Vision-Language Model