Mitigating Spurious Correlations in LLMs via Causality-Aware Post-Training

📅 2025-06-11

📈 Citations: 0

✨ Influential: 0

career value

161K/year

🤖 AI Summary

Large language models (LLMs) often fail to generalize to out-of-distribution (OOD) samples due to spurious correlations acquired during pretraining. To address this, we propose Causal-Aware Post-Training (CAPT), a novel post-training framework grounded in structural causal modeling. CAPT decomposes prediction into two unbiased steps: estimating event probabilities and performing counterfactual interventions—without requiring OOD annotations. Leveraging event decomposition and parameter-efficient updates, CAPT mitigates pretraining biases while avoiding the introduction of new biases common in standard fine-tuning. Using only 100 in-distribution (ID) samples, CAPT fine-tunes a 3B-parameter LLM and achieves significant improvements over supervised fine-tuning (SFT) and even larger LLMs on both CLadder and PrOntoQA benchmarks. Critically, it enhances performance simultaneously on ID and OOD test sets. CAPT thus establishes a new paradigm for enhancing LLM robustness through causal reasoning, offering a lightweight, annotation-efficient, and theoretically grounded approach to distributional generalization.

Technology Category

Application Category

📝 Abstract

While large language models (LLMs) have demonstrated remarkable capabilities in language modeling, recent studies reveal that they often fail on out-of-distribution (OOD) samples due to spurious correlations acquired during pre-training. Here, we aim to mitigate such spurious correlations through causality-aware post-training (CAPT). By decomposing a biased prediction into two unbiased steps, known as extit{event estimation} and extit{event intervention}, we reduce LLMs' pre-training biases without incurring additional fine-tuning biases, thus enhancing the model's generalization ability. Experiments on the formal causal inference benchmark CLadder and the logical reasoning dataset PrOntoQA show that 3B-scale language models fine-tuned with CAPT can outperform both traditional SFT and larger LLMs on in-distribution (ID) and OOD tasks using only 100 ID fine-tuning samples, demonstrating the effectiveness and sample efficiency of CAPT.

Problem

Research questions and friction points this paper is trying to address.

Mitigate spurious correlations in LLMs

Enhance generalization via causality-aware post-training

Reduce biases without fine-tuning biases

Innovation

Methods, ideas, or system contributions that make the work stand out.

Causality-aware post-training mitigates spurious correlations

Decomposes biased prediction into unbiased estimation and intervention

Enhances generalization with minimal fine-tuning samples

🔎 Similar Papers

From Pre-training Corpora to Large Language Models: What Factors Influence LLM Performance in Causal Discovery Tasks?