Humanizing LLMs: A Survey of Psychological Measurements with Tools, Datasets, and Human-Agent Applications

📅 2025-04-30

📈 Citations: 0

✨ Influential: 0

career value

224K/year

🤖 AI Summary

Existing evaluations of psychological traits in large language models (LLMs) lack a systematic psychometric framework. Method: This paper pioneers a comprehensive psychometric analysis of LLMs, covering six dimensions: assessment instruments, domain-specific psychological datasets, stability and consistency metrics, personality simulation, behavioral modeling, and cross-task empirical validation. Through controlled prompt-based experiments, it examines reproducible yet task-dependent personality tendencies across multiple models and identifies structural misalignments between classical psychological scales and LLM capabilities. Contribution/Results: We propose the first standardized evaluation framework integrating LLM-tailored assessment tools, benchmark datasets, and personality/behavioral modeling methods. This framework establishes a theoretical foundation and practical methodology for developing interpretable, robust, and generalizable psychological assessments in LLMs, thereby advancing research on trustworthy human-AI collaboration grounded in empirically validated psychological mechanisms.

Technology Category

Application Category

📝 Abstract

As large language models (LLMs) are increasingly used in human-centered tasks, assessing their psychological traits is crucial for understanding their social impact and ensuring trustworthy AI alignment. While existing reviews have covered some aspects of related research, several important areas have not been systematically discussed, including detailed discussions of diverse psychological tests, LLM-specific psychological datasets, and the applications of LLMs with psychological traits. To address this gap, we systematically review six key dimensions of applying psychological theories to LLMs: (1) assessment tools; (2) LLM-specific datasets; (3) evaluation metrics (consistency and stability); (4) empirical findings; (5) personality simulation methods; and (6) LLM-based behavior simulation. Our analysis highlights both the strengths and limitations of current methods. While some LLMs exhibit reproducible personality patterns under specific prompting schemes, significant variability remains across tasks and settings. Recognizing methodological challenges such as mismatches between psychological tools and LLMs' capabilities, as well as inconsistencies in evaluation practices, this study aims to propose future directions for developing more interpretable, robust, and generalizable psychological assessment frameworks for LLMs.

Problem

Research questions and friction points this paper is trying to address.

Assessing psychological traits in LLMs for trustworthy AI alignment

Systematically reviewing psychological theories applied to LLMs

Proposing robust psychological assessment frameworks for LLMs

Innovation

Methods, ideas, or system contributions that make the work stand out.

Systematically reviewing psychological theories for LLMs

Analyzing LLM-specific datasets and evaluation metrics

Proposing robust psychological assessment frameworks

🔎 Similar Papers

Quantifying AI Psychology: A Psychometrics Benchmark for Large Language Models

2024-06-25arXiv.orgCitations: 26

Do LLMs Have Distinct and Consistent Personality? TRAIT: Personality Testset designed for LLMs with Psychometrics

2024-06-20arXiv.orgCitations: 5

Apple

Seattle, United States of America

Research Scientist Intern, Multimodal AI (PhD)