🤖 AI Summary
This study addresses a critical gap in the evaluation of large language models (LLMs) by questioning the uncritical application of human personality theories—such as the Big Five—to assess LLM behavior without first validating their theoretical applicability. For the first time, the authors systematically examine whether mainstream LLMs satisfy the six core psychological criteria that define human personality when responding to standard personality inventories. Their findings demonstrate that LLM behaviors fail to meet any of these foundational criteria, thereby challenging the validity of current anthropomorphic assessment practices. The work underscores the fundamental limitations of imposing human personality frameworks onto artificial systems and advocates for a shift toward functional, non-anthropomorphic characterizations of LLM behavior. It further lays the theoretical groundwork for developing stable, model-specific behavioral evaluation paradigms.
📝 Abstract
A growing body of research examines personality traits in Large Language Models (LLMs), particularly in human-agent collaboration. Prior work has frequently applied the Big Five inventory to assess LLM behavior analogous to human personality, without questioning the underlying assumptions. This paper critically evaluates whether LLM responses to personality tests satisfy six defining characteristics of personality. We find that none are fully met, indicating that such assessments do not measure a construct equivalent to human personality. We propose a research agenda for shifting from anthropomorphic trait attribution toward functional evaluations, clarifying what personality tests actually capture in LLMs and developing LLM-specific frameworks for characterizing stable, intrinsic behavior.