Do LLMs Have Distinct and Consistent Personality? TRAIT: Personality Testset designed for LLMs with Psychometrics

📅 2024-06-20

🏛️ arXiv.org

📈 Citations: 5

✨ Influential: 0

career value

209K/year

🤖 AI Summary

This work investigates whether large language models (LLMs) exhibit stable, measurable personality traits. Method: The authors introduce TRAIT, the first psychometrically grounded, 8K-item multiple-choice personality benchmark for LLMs, integrating the Big Five Inventory (BFI), the Short Dark Triad (SD-3), and the ATOMIC-10X knowledge graph. TRAIT supports four-dimensional validation: content validity, internal consistency, reliability, and refusal rate. Contribution/Results: The study provides the first empirical evidence that LLMs display statistically significant and cross-task stable personality tendencies. These tendencies are strongly conditioned by alignment training data, while prevailing prompting techniques fail to elicit extreme trait dimensions. TRAIT outperforms all existing LLM personality benchmarks across four core evaluation metrics—content validity, internal consistency, test–retest reliability, and response completeness—establishing a reproducible, interpretable framework for personality modeling and controllable generation in LLMs.

Technology Category

Application Category

📝 Abstract

Recent advancements in Large Language Models (LLMs) have led to their adaptation in various domains as conversational agents. We wonder: can personality tests be applied to these agents to analyze their behavior, similar to humans? We introduce TRAIT, a new benchmark consisting of 8K multi-choice questions designed to assess the personality of LLMs. TRAIT is built on two psychometrically validated small human questionnaires, Big Five Inventory (BFI) and Short Dark Triad (SD-3), enhanced with the ATOMIC-10X knowledge graph to a variety of real-world scenarios. TRAIT also outperforms existing personality tests for LLMs in terms of reliability and validity, achieving the highest scores across four key metrics: Content Validity, Internal Validity, Refusal Rate, and Reliability. Using TRAIT, we reveal two notable insights into personalities of LLMs: 1) LLMs exhibit distinct and consistent personality, which is highly influenced by their training data (e.g., data used for alignment tuning), and 2) current prompting techniques have limited effectiveness in eliciting certain traits, such as high psychopathy or low conscientiousness, suggesting the need for further research in this direction.

Problem

Research questions and friction points this paper is trying to address.

Assessing personality traits in Large Language Models (LLMs).

Developing TRAIT, a reliable and valid personality test for LLMs.

Investigating the influence of training data on LLMs' personality consistency.

Innovation

Methods, ideas, or system contributions that make the work stand out.

TRAIT: 8K multi-choice LLM personality test

Combines BFI, SD-3 with ATOMIC-10X knowledge graph

Exceeds in reliability, validity across four metrics

🔎 Similar Papers

Quantifying AI Psychology: A Psychometrics Benchmark for Large Language Models