🤖 AI Summary
Existing LLM evaluation methods collapse multidimensional capabilities into a single scalar score, obscuring structural differences in model competencies and the intrinsic difficulty distribution of test items. To address this, we propose JE-IRT (Joint Embedding Item Response Theory), a framework that jointly embeds language models and test items in a shared geometric space: item semantics are encoded by direction, item difficulty by radial magnitude, and model capability by projection strength along semantic directions. JE-IRT eliminates reliance on manually annotated categories, revealing only partial alignment between learned capability structures and human-defined topics. It enables zero-shot embedding of novel models and supports interpretable cross-distribution performance analysis. Experiments demonstrate that JE-IRT accurately estimates item difficulty, explains out-of-distribution performance, provides intuitive capability visualizations, and significantly improves both evaluation efficiency and interpretability.
📝 Abstract
Standard LLM evaluation practices compress diverse abilities into single scores, obscuring their inherently multidimensional nature. We present JE-IRT, a geometric item-response framework that embeds both LLMs and questions in a shared space. For question embeddings, the direction encodes semantics and the norm encodes difficulty, while correctness on each question is determined by the geometric interaction between the model and question embeddings. This geometry replaces a global ranking of LLMs with topical specialization and enables smooth variation across related questions. Building on this framework, our experimental results reveal that out-of-distribution behavior can be explained through directional alignment, and that larger norms consistently indicate harder questions. Moreover, JE-IRT naturally supports generalization: once the space is learned, new LLMs are added by fitting a single embedding. The learned space further reveals an LLM-internal taxonomy that only partially aligns with human-defined subject categories. JE-IRT thus establishes a unified and interpretable geometric lens that connects LLM abilities with the structure of questions, offering a distinctive perspective on model evaluation and generalization.