π€ AI Summary
Existing language model (LM) capability evaluations suffer from severe confounding effects, prohibitively high retraining costs, and poorly understood causal mechanisms. Method: We propose the first causal representation learning framework for disentangling LMsβ implicit capabilities, integrating linear structural equation modeling (SEM), common-cause control, and benchmark performance matrix decomposition to construct interpretable linear causal graphs across 1,500+ models on the Open LLM Leaderboard and six major benchmarks. Contribution/Results: We discover, for the first time, a directed causal pathway: βgeneral problem solving β instruction following β mathematical reasoning,β with foundation models identified as a critical common cause. Our framework substantially improves interpretability and causal fidelity in capability assessment, effectively correcting confounding bias in conventional rankings. It establishes a novel paradigm for causal attribution and controllable evolution of LM capabilities.
π Abstract
Faithful evaluation of language model capabilities is crucial for deriving actionable insights that can inform model development. However, rigorous causal evaluations in this domain face significant methodological challenges, including complex confounding effects and prohibitive computational costs associated with extensive retraining. To tackle these challenges, we propose a causal representation learning framework wherein observed benchmark performance is modeled as a linear transformation of a few latent capability factors. Crucially, these latent factors are identified as causally interrelated after appropriately controlling for the base model as a common confounder. Applying this approach to a comprehensive dataset encompassing over 1500 models evaluated across six benchmarks from the Open LLM Leaderboard, we identify a concise three-node linear causal structure that reliably explains the observed performance variations. Further interpretation of this causal structure provides substantial scientific insights beyond simple numerical rankings: specifically, we reveal a clear causal direction starting from general problem-solving capabilities, advancing through instruction-following proficiency, and culminating in mathematical reasoning ability. Our results underscore the essential role of carefully controlling base model variations during evaluation, a step critical to accurately uncovering the underlying causal relationships among latent model capabilities.