KnowBias: Mitigating Social Bias in LLMs via Know-Bias Neuron Enhancement

📅 2026-01-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the issue of social biases in large language models, which can reinforce harmful stereotypes and impede their safe deployment. The authors propose a lightweight debiasing paradigm based on “enhancement rather than suppression”: during inference, a small set of yes/no questions probes bias-related knowledge, attribution analysis identifies key neurons associated with biased responses, and their activations are selectively amplified. This approach requires no model retraining and only minimal data, yet generalizes effectively across diverse bias types and demographic groups. Experimental results demonstrate state-of-the-art debiasing performance across multiple benchmarks and mainstream large language models, with negligible degradation to the models’ general capabilities.

Technology Category

Application Category

📝 Abstract
Large language models (LLMs) exhibit social biases that reinforce harmful stereotypes, limiting their safe deployment. Most existing debiasing methods adopt a suppressive paradigm by modifying parameters, prompts, or neurons associated with biased behavior; however, such approaches are often brittle, weakly generalizable, data-inefficient, and prone to degrading general capability. We propose \textbf{KnowBias}, a lightweight and conceptually distinct framework that mitigates bias by strengthening, rather than suppressing, neurons encoding bias-knowledge. KnowBias identifies neurons encoding bias knowledge using a small set of bias-knowledge questions via attribution-based analysis, and selectively enhances them at inference time. This design enables strong debiasing while preserving general capabilities, generalizes across bias types and demographics, and is highly data efficient, requiring only a handful of simple yes/no questions and no retraining. Experiments across multiple benchmarks and LLMs demonstrate consistent state-of-the-art debiasing performance with minimal utility degradation. Data and code are available at https://github.com/JP-25/KnowBias.
Problem

Research questions and friction points this paper is trying to address.

social bias
large language models
stereotypes
debiasing
harmful bias
Innovation

Methods, ideas, or system contributions that make the work stand out.

bias mitigation
neuron enhancement
attribution-based analysis
inference-time intervention
data-efficient debiasing
🔎 Similar Papers
No similar papers found.
Jinhao Pan
Jinhao Pan
Ph.D. Student in Computer Science, George Mason University
LLMResponsible AIRecommender System
Chahat Raj
Chahat Raj
George Mason University
NLPFairnessEthicsSociety & Culture
A
A. Mukherjee
Department of Computer Science, George Mason University, Fairfax, VA, USA
S
Sina Mansouri
Department of Computer Science, George Mason University, Fairfax, VA, USA
B
Bowen Wei
Department of Computer Science, George Mason University, Fairfax, VA, USA
S
Shloka Yada
Lightridge High School, Aldie, VA, USA
Ziwei Zhu
Ziwei Zhu
Assistant Professor at George Mason University
data mininginformation retrievalmachine learningresponsible AI