Beyond Self-Reports: Multi-Observer Agents for Personality Assessment in Large Language Models

📅 2025-04-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses systematic bias and metaknowledge contamination arising from self-report questionnaires in LLM personality assessment. We propose a relational-contextual multi-observer agent framework that simulates interpersonal interactions across diverse social contexts—such as family, friendship, and workplace—to elicit behavioral manifestations of the target LLM. Role-specific observer agents independently rate the LLM along the Big Five personality dimensions; scores are aggregated across 5–7 observers to balance reliability and contextual sensitivity. Our work pioneers the shift from introspective self-report to external behavioral observation in LLM personality evaluation; provides the first empirical evidence of consistent, significant bias in LLM self-reports; and demonstrates that relational context critically shapes personality perception. Experiments show that our framework substantially reduces nonsystematic bias and enhances both robustness and construct validity of personality assessment.

Technology Category

Application Category

📝 Abstract
There is a growing interest in assessing the personality traits of Large language models (LLMs). However, traditional personality assessments based on self-report questionnaires may fail to capture their true behavioral nuances due to inherent biases and meta-knowledge contamination. This paper introduces a novel multi-observer framework for LLM personality assessment that draws inspiration from informant-report methods in psychology. Instead of relying solely on self-assessments, our approach employs multiple observer agents configured with a specific relationship context (e.g., family, friend, or workplace) to simulate interactive scenarios with a subject LLM. These observers engage in dialogues and subsequently provide ratings across the Big Five personality dimensions. Our experiments reveal that LLMs possess systematic biases in self-report personality ratings. Moreover, aggregating observer ratings effectively reduces non-systematic biases and achieves optimal reliability with 5-7 observers. The findings highlight the significant impact of relationship context on personality perception and demonstrate that a multi-observer paradigm yields a more robust and context-sensitive evaluation of LLM personality traits.
Problem

Research questions and friction points this paper is trying to address.

Assessing LLM personality traits beyond biased self-reports
Reducing biases via multi-observer agent rating aggregation
Evaluating context impact on LLM personality perception
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-observer framework for LLM personality assessment
Observer agents simulate interactive relationship contexts
Aggregating ratings reduces biases and enhances reliability
🔎 Similar Papers
No similar papers found.
Y
Yin Jou Huang
Graduate School of Informatics, Kyoto University, Kyoto, Japan
Rafik Hadfi
Rafik Hadfi
Kyoto University
Artificial IntelligenceMultiagent SystemsSocial SimulationGame Theory