My Answer Is NOT 'Fair': Mitigating Social Bias in Vision-Language Models via Fair and Biased Residuals

📅 2025-05-26

📈 Citations: 0

✨ Influential: 0

career value

192K/year

🤖 AI Summary

Large Vision-Language Models (VLMs) exhibit significant societal biases along gender and racial dimensions, undermining their fairness and social trustworthiness. This work is the first to concurrently assess bias in both generated responses and underlying probability distributions during inference, uncovering a bidirectional regulatory mechanism of hidden-layer residuals on fairness. We propose a training-agnostic, model-agnostic post-processing method: leveraging intra-layer residual decomposition and fairness sensitivity analysis, it selectively suppresses bias-correlated residuals while amplifying fairness-correlated ones, enabling dynamic fairness calibration without fine-tuning or retraining. Evaluated on PAIRS and SocialCounterfactuals benchmarks, our approach substantially reduces response-level bias and improves cross-group confidence calibration—outperforming state-of-the-art training-based debiasing methods.

Technology Category

Application Category

📝 Abstract

Social bias is a critical issue in large vision-language models (VLMs), where fairness- and ethics-related problems harm certain groups of people in society. It is unknown to what extent VLMs yield social bias in generative responses. In this study, we focus on evaluating and mitigating social bias on both the model's response and probability distribution. To do so, we first evaluate four state-of-the-art VLMs on PAIRS and SocialCounterfactuals datasets with the multiple-choice selection task. Surprisingly, we find that models suffer from generating gender-biased or race-biased responses. We also observe that models are prone to stating their responses are fair, but indeed having mis-calibrated confidence levels towards particular social groups. While investigating why VLMs are unfair in this study, we observe that VLMs' hidden layers exhibit substantial fluctuations in fairness levels. Meanwhile, residuals in each layer show mixed effects on fairness, with some contributing positively while some lead to increased bias. Based on these findings, we propose a post-hoc method for the inference stage to mitigate social bias, which is training-free and model-agnostic. We achieve this by ablating bias-associated residuals while amplifying fairness-associated residuals on model hidden layers during inference. We demonstrate that our post-hoc method outperforms the competing training strategies, helping VLMs have fairer responses and more reliable confidence levels.

Problem

Research questions and friction points this paper is trying to address.

Evaluating social bias in vision-language models' responses and probability distributions

Identifying gender and race biases in model-generated responses

Mitigating bias via post-hoc ablation of bias-associated residuals

Innovation

Methods, ideas, or system contributions that make the work stand out.

Evaluates VLMs on bias using PAIRS datasets

Proposes training-free post-hoc bias mitigation

Adjusts hidden layer residuals for fairness

🔎 Similar Papers

Social Debiasing for Fair Multi-modal LLMs