LFTF: Locating First and Then Fine-Tuning for Mitigating Gender Bias in Large Language Models

📅 2025-05-21

📈 Citations: 0

✨ Influential: 0

career value

200K/year

🤖 AI Summary

Large language models (LLMs) inherit gender bias from training data, necessitating systematic evaluation and mitigation. Method: We propose a comprehensive framework featuring (i) GenBiasEval/GenHintEval—the first dual benchmark dataset jointly quantifying bias strength and prompt-level consistency; (ii) two novel metrics—AFGB-Score (bias mitigation gain) and UB-Score (bias consistency); (iii) BMI (Block-wise Bias Importance), the first module-level bias localization mechanism; and (iv) LFTF, a two-stage debiasing algorithm that first identifies high-bias parameter modules via BMI, then applies customized loss-driven local fine-tuning. Results: Experiments demonstrate that LFTF reduces gender bias significantly (AFGB-Score improves by over 40%) while preserving general capabilities—zero-shot accuracy drops by less than 0.8%.

Technology Category

Application Category

📝 Abstract

Nowadays, Large Language Models (LLMs) have attracted widespread attention due to their powerful performance. However, due to the unavoidable exposure to socially biased data during training, LLMs tend to exhibit social biases, particularly gender bias. To better explore and quantifying the degree of gender bias in LLMs, we propose a pair of datasets named GenBiasEval and GenHintEval, respectively. The GenBiasEval is responsible for evaluating the degree of gender bias in LLMs, accompanied by an evaluation metric named AFGB-Score (Absolutely Fair Gender Bias Score). Meanwhile, the GenHintEval is used to assess whether LLMs can provide responses consistent with prompts that contain gender hints, along with the accompanying evaluation metric UB-Score (UnBias Score). Besides, in order to mitigate gender bias in LLMs more effectively, we present the LFTF (Locating First and Then Fine-Tuning) algorithm.The algorithm first ranks specific LLM blocks by their relevance to gender bias in descending order using a metric called BMI (Block Mitigating Importance Score). Based on this ranking, the block most strongly associated with gender bias is then fine-tuned using a carefully designed loss function. Numerous experiments have shown that our proposed LFTF algorithm can significantly mitigate gender bias in LLMs while maintaining their general capabilities.

Problem

Research questions and friction points this paper is trying to address.

Mitigating gender bias in Large Language Models (LLMs)

Evaluating gender bias using GenBiasEval and GenHintEval datasets

Fine-tuning LLMs with LFTF algorithm to reduce bias

Innovation

Methods, ideas, or system contributions that make the work stand out.

Locate gender bias blocks via BMI score

Fine-tune top bias-related block specifically

Evaluate bias with AFGB and UB metrics

🔎 Similar Papers

From Prejudice to Parity: A New Approach to Debiasing Large Language Model Word Embeddings