A Shared Geometry of Difficulty in Multilingual Language Models

📅 2026-01-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study investigates the internal representation of task difficulty and its cross-lingual generalization in multilingual large language models. Leveraging the Easy2Hard benchmark—a 21-language subset of the AMC dataset—the authors analyze layer-wise representations through a linear probing framework and uncover a two-stage evolution of difficulty signals. In early layers, representations are language-agnostic, enabling strong cross-lingual generalization despite modest monolingual performance. In deeper layers, representations become language-specific, yielding high monolingual accuracy at the cost of reduced cross-lingual transferability. These findings extend abstract conceptual space theory to metacognitive attributes for the first time, revealing a hierarchical structure underlying difficulty perception in multilingual models and delineating the boundaries of cross-lingual transfer for such meta-level knowledge.

Technology Category

Application Category

📝 Abstract
Predicting problem-difficulty in large language models (LLMs) refers to estimating how difficult a task is according to the model itself, typically by training linear probes on its internal representations. In this work, we study the multilingual geometry of problem-difficulty in LLMs by training linear probes using the AMC subset of the Easy2Hard benchmark, translated into 21 languages. We found that difficulty-related signals emerge at two distinct stages of the model internals, corresponding to shallow (early-layers) and deep (later-layers) internal representations, that exhibit functionally different behaviors. Probes trained on deep representations achieve high accuracy when evaluated on the same language but exhibit poor cross-lingual generalization. In contrast, probes trained on shallow representations generalize substantially better across languages, despite achieving lower within-language performance. Together, these results suggest that LLMs first form a language-agnostic representation of problem difficulty, which subsequently becomes language-specific. This closely aligns with existing findings in LLM interpretability showing that models tend to operate in an abstract conceptual space before producing language-specific outputs. We demonstrate that this two-stage representational process extends beyond semantic content to high-level meta-cognitive properties such as problem-difficulty estimation.
Problem

Research questions and friction points this paper is trying to address.

multilingual language models
problem difficulty
representation geometry
cross-lingual generalization
meta-cognitive properties
Innovation

Methods, ideas, or system contributions that make the work stand out.

multilingual language models
problem difficulty
linear probes
cross-lingual generalization
representational geometry
🔎 Similar Papers
No similar papers found.
S
Stefano Civelli
The University of Queensland
P
Pietro Bernardelle
The University of Queensland
N
Nicolo Brunello
Polytechnic University of Milan
Gianluca Demartini
Gianluca Demartini
Professor at the University of Queensland
Information RetrievalSemantic WebHuman ComputationCrowdsourcing