Information Gain-Guided Causal Intervention for Autonomous Debiasing Large Language Models

📅 2025-04-17

📈 Citations: 0

✨ Influential: 0

career value

201K/year

🤖 AI Summary

Large language models (LLMs) are prone to bias inherited from instruction-tuning datasets, degrading their generalization performance. Existing debiasing methods rely heavily on human priors or in-context learning, limiting adaptability across diverse bias types. This paper proposes the first autonomous debiasing framework integrating information theory and causal inference: it quantifies bias impact via information gain, constructs a structural causal model (SCM), and performs counterfactual intervention to automatically reweight the data distribution prior to standard supervised fine-tuning. The method requires no manual annotations, external knowledge bases, or predefined bias assumptions, enabling adaptive correction of heterogeneous biases. Evaluated on multiple benchmarks, our approach significantly improves model generalization and reduces bias metrics by an average of 32.7%.

Technology Category

Application Category

📝 Abstract

Despite significant progress, recent studies indicate that current large language models (LLMs) may still capture dataset biases and utilize them during inference, leading to the poor generalizability of LLMs. However, due to the diversity of dataset biases and the insufficient nature of bias suppression based on in-context learning, the effectiveness of previous prior knowledge-based debiasing methods and in-context learning based automatic debiasing methods is limited. To address these challenges, we explore the combination of causal mechanisms with information theory and propose an information gain-guided causal intervention debiasing (IGCIDB) framework. This framework first utilizes an information gain-guided causal intervention method to automatically and autonomously balance the distribution of instruction-tuning dataset. Subsequently, it employs a standard supervised fine-tuning process to train LLMs on the debiased dataset. Experimental results show that IGCIDB can effectively debias LLM to improve its generalizability across different tasks.

Problem

Research questions and friction points this paper is trying to address.

Address LLMs' dataset bias for better generalizability

Combine causal mechanisms with information theory

Automatically balance instruction-tuning dataset distribution

Innovation

Methods, ideas, or system contributions that make the work stand out.

Information gain-guided causal intervention method

Autonomous balancing of dataset distribution

Supervised fine-tuning on debiased data

🔎 Similar Papers

Prompting Fairness: Integrating Causality to Debias Large Language Models