Reasoning Towards Fairness: Mitigating Bias in Language Models through Reasoning-Guided Fine-Tuning

📅 2025-04-08

📈 Citations: 0

✨ Influential: 0

career value

192K/year

🤖 AI Summary

This work investigates the intrinsic relationship between language models’ reasoning capabilities and fairness, revealing that stronger reasoning abilities inherently mitigate stereotypical biases. To leverage this insight, we propose Reasoning-Guided Fine-Tuning (ReGiFT), a bias-agnostic fine-tuning method that injects generic, structured reasoning trajectories to progressively guide models toward deep, logically consistent reasoning—without requiring fairness annotations, domain-specific supervision, or biased labels. ReGiFT achieves zero-shot fairness transfer using only publicly available general-purpose reasoning data. Experiments demonstrate that ReGiFT significantly improves model performance across multiple fairness benchmarks, outperforming state-of-the-art specialized reasoning and debiasing models. Critically, it provides the first empirical validation of the generalizable principle that “enhancing reasoning inherently improves fairness.” This finding establishes a novel paradigm for fairness-aware modeling grounded in reasoning augmentation rather than explicit bias mitigation.

Technology Category

Application Category

📝 Abstract

Recent advances in large-scale generative language models have shown that reasoning capabilities can significantly improve model performance across a variety of tasks. However, the impact of reasoning on a model's ability to mitigate stereotypical responses remains largely underexplored. In this work, we investigate the crucial relationship between a model's reasoning ability and fairness, and ask whether improved reasoning capabilities can mitigate harmful stereotypical responses, especially those arising due to shallow or flawed reasoning. We conduct a comprehensive evaluation of multiple open-source LLMs, and find that larger models with stronger reasoning abilities exhibit substantially lower stereotypical bias on existing fairness benchmarks. Building on this insight, we introduce ReGiFT -- Reasoning Guided Fine-Tuning, a novel approach that extracts structured reasoning traces from advanced reasoning models and infuses them into models that lack such capabilities. We use only general-purpose reasoning and do not require any fairness-specific supervision for bias mitigation. Notably, we see that models fine-tuned using ReGiFT not only improve fairness relative to their non-reasoning counterparts but also outperform advanced reasoning models on fairness benchmarks. We also analyze how variations in the correctness of the reasoning traces and their length influence model fairness and their overall performance. Our findings highlight that enhancing reasoning capabilities is an effective, fairness-agnostic strategy for mitigating stereotypical bias caused by reasoning flaws.

Problem

Research questions and friction points this paper is trying to address.

Exploring how reasoning affects language model fairness

Mitigating bias via reasoning-guided fine-tuning without fairness labels

Assessing impact of reasoning trace quality on bias reduction

Innovation

Methods, ideas, or system contributions that make the work stand out.

Extracts reasoning traces from advanced models

Infuses reasoning into less capable models

Uses general-purpose reasoning for bias mitigation

🔎 Similar Papers

Developing Safe and Responsible Large Language Model : Can We Balance Bias Reduction and Language Understanding in Large Language Models?