Information Gain-Guided Causal Intervention for Autonomous Debiasing Large Language Models

📅 2025-04-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Large language models (LLMs) are prone to bias inherited from instruction-tuning datasets, degrading their generalization performance. Existing debiasing methods rely heavily on human priors or in-context learning, limiting adaptability across diverse bias types. This paper proposes the first autonomous debiasing framework integrating information theory and causal inference: it quantifies bias impact via information gain, constructs a structural causal model (SCM), and performs counterfactual intervention to automatically reweight the data distribution prior to standard supervised fine-tuning. The method requires no manual annotations, external knowledge bases, or predefined bias assumptions, enabling adaptive correction of heterogeneous biases. Evaluated on multiple benchmarks, our approach significantly improves model generalization and reduces bias metrics by an average of 32.7%.

Technology Category

Application Category

📝 Abstract
Despite significant progress, recent studies indicate that current large language models (LLMs) may still capture dataset biases and utilize them during inference, leading to the poor generalizability of LLMs. However, due to the diversity of dataset biases and the insufficient nature of bias suppression based on in-context learning, the effectiveness of previous prior knowledge-based debiasing methods and in-context learning based automatic debiasing methods is limited. To address these challenges, we explore the combination of causal mechanisms with information theory and propose an information gain-guided causal intervention debiasing (IGCIDB) framework. This framework first utilizes an information gain-guided causal intervention method to automatically and autonomously balance the distribution of instruction-tuning dataset. Subsequently, it employs a standard supervised fine-tuning process to train LLMs on the debiased dataset. Experimental results show that IGCIDB can effectively debias LLM to improve its generalizability across different tasks.
Problem

Research questions and friction points this paper is trying to address.

Address LLMs' dataset bias for better generalizability
Combine causal mechanisms with information theory
Automatically balance instruction-tuning dataset distribution
Innovation

Methods, ideas, or system contributions that make the work stand out.

Information gain-guided causal intervention method
Autonomous balancing of dataset distribution
Supervised fine-tuning on debiased data
🔎 Similar Papers
No similar papers found.
Zhouhao Sun
Zhouhao Sun
Harbin Institute of Technology
NLP
X
Xiao Ding
Research Center for Social Computing and Information Retrieval, Harbin Institute of Technology, China
L
Li Du
Beijing Academy of Artificial Intelligence, Beijing, China
Y
Yunpeng Xu
Research Center for Social Computing and Information Retrieval, Harbin Institute of Technology, China
Yixuan Ma
Yixuan Ma
Shanghai Jiaotong University
GraphHypergraph
Y
Yang Zhao
Research Center for Social Computing and Information Retrieval, Harbin Institute of Technology, China
Bing Qin
Bing Qin
Professor in Harbin Institute of Technology
Natural Language ProcessingInformation ExtractionSentiment Analysis
T
Ting Liu
Research Center for Social Computing and Information Retrieval, Harbin Institute of Technology, China