LFTF: Locating First and Then Fine-Tuning for Mitigating Gender Bias in Large Language Models

📅 2025-05-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Large language models (LLMs) inherit gender bias from training data, necessitating systematic evaluation and mitigation. Method: We propose a comprehensive framework featuring (i) GenBiasEval/GenHintEval—the first dual benchmark dataset jointly quantifying bias strength and prompt-level consistency; (ii) two novel metrics—AFGB-Score (bias mitigation gain) and UB-Score (bias consistency); (iii) BMI (Block-wise Bias Importance), the first module-level bias localization mechanism; and (iv) LFTF, a two-stage debiasing algorithm that first identifies high-bias parameter modules via BMI, then applies customized loss-driven local fine-tuning. Results: Experiments demonstrate that LFTF reduces gender bias significantly (AFGB-Score improves by over 40%) while preserving general capabilities—zero-shot accuracy drops by less than 0.8%.

Technology Category

Application Category

📝 Abstract
Nowadays, Large Language Models (LLMs) have attracted widespread attention due to their powerful performance. However, due to the unavoidable exposure to socially biased data during training, LLMs tend to exhibit social biases, particularly gender bias. To better explore and quantifying the degree of gender bias in LLMs, we propose a pair of datasets named GenBiasEval and GenHintEval, respectively. The GenBiasEval is responsible for evaluating the degree of gender bias in LLMs, accompanied by an evaluation metric named AFGB-Score (Absolutely Fair Gender Bias Score). Meanwhile, the GenHintEval is used to assess whether LLMs can provide responses consistent with prompts that contain gender hints, along with the accompanying evaluation metric UB-Score (UnBias Score). Besides, in order to mitigate gender bias in LLMs more effectively, we present the LFTF (Locating First and Then Fine-Tuning) algorithm.The algorithm first ranks specific LLM blocks by their relevance to gender bias in descending order using a metric called BMI (Block Mitigating Importance Score). Based on this ranking, the block most strongly associated with gender bias is then fine-tuned using a carefully designed loss function. Numerous experiments have shown that our proposed LFTF algorithm can significantly mitigate gender bias in LLMs while maintaining their general capabilities.
Problem

Research questions and friction points this paper is trying to address.

Mitigating gender bias in Large Language Models (LLMs)
Evaluating gender bias using GenBiasEval and GenHintEval datasets
Fine-tuning LLMs with LFTF algorithm to reduce bias
Innovation

Methods, ideas, or system contributions that make the work stand out.

Locate gender bias blocks via BMI score
Fine-tune top bias-related block specifically
Evaluate bias with AFGB and UB metrics
🔎 Similar Papers
No similar papers found.
Z
Zhanyue Qin
Harbin Institute of Technology
Y
Yue Ding
Harbin Institute of Technology
D
Deyuan Liu
Harbin Institute of Technology
Q
Qingbin Liu
Tencent
J
Junxian Cai
Tencent
X
Xi Chen
Tencent
Zhiying Tu
Zhiying Tu
Harbin Institute of Technology
software engineering
D
Dianhui Chu
Harbin Institute of Technology
C
Cuiyun Gao
Harbin Institute of Technology
Dianbo Sui
Dianbo Sui
Harbin Institute of Technology