🤖 AI Summary
This study investigates the internal representation of task difficulty and its cross-lingual generalization in multilingual large language models. Leveraging the Easy2Hard benchmark—a 21-language subset of the AMC dataset—the authors analyze layer-wise representations through a linear probing framework and uncover a two-stage evolution of difficulty signals. In early layers, representations are language-agnostic, enabling strong cross-lingual generalization despite modest monolingual performance. In deeper layers, representations become language-specific, yielding high monolingual accuracy at the cost of reduced cross-lingual transferability. These findings extend abstract conceptual space theory to metacognitive attributes for the first time, revealing a hierarchical structure underlying difficulty perception in multilingual models and delineating the boundaries of cross-lingual transfer for such meta-level knowledge.
📝 Abstract
Predicting problem-difficulty in large language models (LLMs) refers to estimating how difficult a task is according to the model itself, typically by training linear probes on its internal representations. In this work, we study the multilingual geometry of problem-difficulty in LLMs by training linear probes using the AMC subset of the Easy2Hard benchmark, translated into 21 languages. We found that difficulty-related signals emerge at two distinct stages of the model internals, corresponding to shallow (early-layers) and deep (later-layers) internal representations, that exhibit functionally different behaviors. Probes trained on deep representations achieve high accuracy when evaluated on the same language but exhibit poor cross-lingual generalization. In contrast, probes trained on shallow representations generalize substantially better across languages, despite achieving lower within-language performance. Together, these results suggest that LLMs first form a language-agnostic representation of problem difficulty, which subsequently becomes language-specific. This closely aligns with existing findings in LLM interpretability showing that models tend to operate in an abstract conceptual space before producing language-specific outputs. We demonstrate that this two-stage representational process extends beyond semantic content to high-level meta-cognitive properties such as problem-difficulty estimation.