Large Language Model Psychometrics: A Systematic Review of Evaluation, Validation, and Enhancement

📅 2025-05-13
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Addressing critical challenges in LLM evaluation—including insufficient human-centeredness, limitations of static task-based benchmarks, and lack of interpretability—this paper introduces a psychometric paradigm to establish the first comprehensive LLM psychometric system framework. Methodologically, it integrates classical reliability and validity analysis, item response theory, multidimensional latent variable modeling, and behavioral experimental design, augmented by prompt engineering and semantic response analysis to quantitatively assess psychological constructs such as personality, values, and cognitive style. Key contributions include: (1) a paradigmatic shift from “problem-solving ability” to “human-like thinking” assessment; (2) the first structured LLM psychometric knowledge graph and open-source resource repository (Awesome-LLM-Psychometrics); and (3) an interdisciplinary theoretical foundation and empirical guidance for generalizable, interpretable, and human-centered AI evaluation.

Technology Category

Application Category

📝 Abstract
The rapid advancement of large language models (LLMs) has outpaced traditional evaluation methodologies. It presents novel challenges, such as measuring human-like psychological constructs, navigating beyond static and task-specific benchmarks, and establishing human-centered evaluation. These challenges intersect with Psychometrics, the science of quantifying the intangible aspects of human psychology, such as personality, values, and intelligence. This survey introduces and synthesizes an emerging interdisciplinary field of LLM Psychometrics, which leverages psychometric instruments, theories, and principles to evaluate, understand, and enhance LLMs. We systematically explore the role of Psychometrics in shaping benchmarking principles, broadening evaluation scopes, refining methodologies, validating results, and advancing LLM capabilities. This paper integrates diverse perspectives to provide a structured framework for researchers across disciplines, enabling a more comprehensive understanding of this nascent field. Ultimately, we aim to provide actionable insights for developing future evaluation paradigms that align with human-level AI and promote the advancement of human-centered AI systems for societal benefit. A curated repository of LLM psychometric resources is available at https://github.com/valuebyte-ai/Awesome-LLM-Psychometrics.
Problem

Research questions and friction points this paper is trying to address.

Evaluating human-like psychological traits in LLMs
Moving beyond static benchmarks for LLM assessment
Developing human-centered evaluation methods for AI systems
Innovation

Methods, ideas, or system contributions that make the work stand out.

Leveraging psychometric instruments for LLM evaluation
Broadening evaluation scopes with psychometric principles
Developing human-centered AI evaluation paradigms
🔎 Similar Papers
No similar papers found.
Haoran Ye
Haoran Ye
AI PhD @ Peking University
AgentAI Safety and AlignmentAI PsychologyLearn to OptimizeEvolutionary Computation
J
Jing Jin
State Key Laboratory of General Artificial Intelligence, School of Intelligence Science and Technology, Peking University
Yuhang Xie
Yuhang Xie
Peking University
X
Xin Zhang
School of Psychological and Cognitive Sciences, Peking University; Key Laboratory of Machine Perception (Ministry of Education), Peking University
Guojie Song
Guojie Song
Professor (Research), Tenured of Peking University
Psychological AIAI Safe & Value AlignmentAgent Cognition & Behavioral ModelingLLM&GML