FairReason: Balancing Reasoning and Social Bias in MLLMs

📅 2025-07-30

📈 Citations: 0

✨ Influential: 0

career value

184K/year

🤖 AI Summary

Multimodal large language models (MLLMs) often exacerbate social biases when enhanced for reasoning capabilities, yet it remains unclear whether a fundamental trade-off exists between reasoning performance and bias mitigation. Method: We propose the first systematic framework for co-optimizing reasoning ability and social fairness in MLLMs, introducing a reinforcement learning (RL) approach guided by sample-ratio control—dynamically balancing debiasing and reasoning samples at an optimal 1:4 ratio. Our method integrates supervised fine-tuning, knowledge distillation, and rule-guided RL to comprehensively characterize the bias–reasoning Pareto frontier across multiple paradigms. Contribution/Results: Experiments demonstrate that our approach achieves a “sweet spot”: retaining 88% of the original reasoning accuracy while reducing stereotype scores by 10%. This yields a reproducible, scalable optimization pathway toward fairer and more capable MLLMs.

Technology Category

Application Category

📝 Abstract

Multimodal Large Language Models (MLLMs) already achieve state-of-the-art results across a wide range of tasks and modalities. To push their reasoning ability further, recent studies explore advanced prompting schemes and post-training fine-tuning. Although these techniques improve logical accuracy, they frequently leave the models' outputs burdened with pronounced social biases. Clarifying how reasoning gains interact with bias mitigation-and whether the two objectives inherently trade off-therefore remains an open and pressing research problem. Our study begins by benchmarking three bias-mitigation strategies-supervised fine-uning (SFT), knowledge distillation (KD), and rule-based reinforcement learning (RL)-under identical conditions, establishing their baseline strengths and weaknesses. Building on these results, we vary the proportion of debias-focused and reasoning-centric samples within each paradigm to chart the reasoning-versus-bias trade-off. Our sweeps reveal a consistent sweet spot: a roughly 1:4 mix trained with reinforcement learning cuts stereotype scores by 10% while retaining 88% of the model's original reasoning accuracy, offering concrete guidance for balancing fairness and capability in MLLMs.

Problem

Research questions and friction points this paper is trying to address.

Balancing reasoning ability and social bias in MLLMs

Exploring trade-offs between bias mitigation and logical accuracy

Optimizing debias-focused and reasoning-centric training samples

Innovation

Methods, ideas, or system contributions that make the work stand out.

Benchmarking SFT, KD, RL for bias mitigation

Exploring reasoning-bias trade-off with sample ratios

Optimal 1:4 RL mix reduces bias, keeps accuracy

🔎 Similar Papers

Identifying and Mitigating Social Bias Knowledge in Language Models