Beyond Self-Reports: Multi-Observer Agents for Personality Assessment in Large Language Models

📅 2025-04-11

📈 Citations: 0

✨ Influential: 0

career value

207K/year

🤖 AI Summary

This study addresses systematic bias and metaknowledge contamination arising from self-report questionnaires in LLM personality assessment. We propose a relational-contextual multi-observer agent framework that simulates interpersonal interactions across diverse social contexts—such as family, friendship, and workplace—to elicit behavioral manifestations of the target LLM. Role-specific observer agents independently rate the LLM along the Big Five personality dimensions; scores are aggregated across 5–7 observers to balance reliability and contextual sensitivity. Our work pioneers the shift from introspective self-report to external behavioral observation in LLM personality evaluation; provides the first empirical evidence of consistent, significant bias in LLM self-reports; and demonstrates that relational context critically shapes personality perception. Experiments show that our framework substantially reduces nonsystematic bias and enhances both robustness and construct validity of personality assessment.

Technology Category

Application Category

📝 Abstract

There is a growing interest in assessing the personality traits of Large language models (LLMs). However, traditional personality assessments based on self-report questionnaires may fail to capture their true behavioral nuances due to inherent biases and meta-knowledge contamination. This paper introduces a novel multi-observer framework for LLM personality assessment that draws inspiration from informant-report methods in psychology. Instead of relying solely on self-assessments, our approach employs multiple observer agents configured with a specific relationship context (e.g., family, friend, or workplace) to simulate interactive scenarios with a subject LLM. These observers engage in dialogues and subsequently provide ratings across the Big Five personality dimensions. Our experiments reveal that LLMs possess systematic biases in self-report personality ratings. Moreover, aggregating observer ratings effectively reduces non-systematic biases and achieves optimal reliability with 5-7 observers. The findings highlight the significant impact of relationship context on personality perception and demonstrate that a multi-observer paradigm yields a more robust and context-sensitive evaluation of LLM personality traits.

Problem

Research questions and friction points this paper is trying to address.

Assessing LLM personality traits beyond biased self-reports

Reducing biases via multi-observer agent rating aggregation

Evaluating context impact on LLM personality perception

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-observer framework for LLM personality assessment

Observer agents simulate interactive relationship contexts

Aggregating ratings reduces biases and enhances reliability

🔎 Similar Papers

No similar papers found.