Discovering Hierarchical Latent Capabilities of Language Models via Causal Representation Learning

📅 2025-06-12

📈 Citations: 0

✨ Influential: 0

career value

179K/year

🤖 AI Summary

Existing language model (LM) capability evaluations suffer from severe confounding effects, prohibitively high retraining costs, and poorly understood causal mechanisms. Method: We propose the first causal representation learning framework for disentangling LMs’ implicit capabilities, integrating linear structural equation modeling (SEM), common-cause control, and benchmark performance matrix decomposition to construct interpretable linear causal graphs across 1,500+ models on the Open LLM Leaderboard and six major benchmarks. Contribution/Results: We discover, for the first time, a directed causal pathway: “general problem solving → instruction following → mathematical reasoning,” with foundation models identified as a critical common cause. Our framework substantially improves interpretability and causal fidelity in capability assessment, effectively correcting confounding bias in conventional rankings. It establishes a novel paradigm for causal attribution and controllable evolution of LM capabilities.

Technology Category

Application Category

📝 Abstract

Faithful evaluation of language model capabilities is crucial for deriving actionable insights that can inform model development. However, rigorous causal evaluations in this domain face significant methodological challenges, including complex confounding effects and prohibitive computational costs associated with extensive retraining. To tackle these challenges, we propose a causal representation learning framework wherein observed benchmark performance is modeled as a linear transformation of a few latent capability factors. Crucially, these latent factors are identified as causally interrelated after appropriately controlling for the base model as a common confounder. Applying this approach to a comprehensive dataset encompassing over 1500 models evaluated across six benchmarks from the Open LLM Leaderboard, we identify a concise three-node linear causal structure that reliably explains the observed performance variations. Further interpretation of this causal structure provides substantial scientific insights beyond simple numerical rankings: specifically, we reveal a clear causal direction starting from general problem-solving capabilities, advancing through instruction-following proficiency, and culminating in mathematical reasoning ability. Our results underscore the essential role of carefully controlling base model variations during evaluation, a step critical to accurately uncovering the underlying causal relationships among latent model capabilities.

Problem

Research questions and friction points this paper is trying to address.

Evaluating language model capabilities with causal representation learning

Identifying latent capability factors controlling for confounding effects

Uncovering causal relationships among general, instruction, and math skills

Innovation

Methods, ideas, or system contributions that make the work stand out.

Causal representation learning for latent factors

Linear transformation of benchmark performance

Three-node linear causal structure

🔎 Similar Papers

Causal Inference with Large Language Model: A Survey