InterviewSim: A Scalable Framework for Interview-Grounded Personality Simulation

📅 2026-02-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limitation of current large language models in personality simulation, which typically rely on indirect data such as questionnaires or brief interviews rather than direct evidence from individuals’ authentic言行. The authors present the first large-scale evaluation framework grounded in real-world interviews, constructing a dataset of 671,000 question-answer pairs extracted from 23,000 interviews with 1,000 public figures. They introduce a four-dimensional automatic evaluation metric encompassing content similarity, factual consistency, personality alignment, and knowledge retention. Experimental results demonstrate that models trained on real interview data significantly outperform baselines relying solely on biographical texts or parametric knowledge. The study further reveals a trade-off between stylistic expression and factual fidelity, mediated by retrieval augmentation and temporal modeling, thereby offering a scalable and evaluable pathway toward personalized language models.

Technology Category

Application Category

📝 Abstract
Simulating real personalities with large language models requires grounding generation in authentic personal data. Existing evaluation approaches rely on demographic surveys, personality questionnaires, or short AI-led interviews as proxies, but lack direct assessment against what individuals actually said. We address this gap with an interview-grounded evaluation framework for personality simulation at a large scale. We extract over 671,000 question-answer pairs from 23,000 verified interview transcripts across 1,000 public personalities, each with an average of 11.5 hours of interview content. We propose a multi-dimensional evaluation framework with four complementary metrics measuring content similarity, factual consistency, personality alignment, and factual knowledge retention. Through systematic comparison, we demonstrate that methods grounded in real interview data substantially outperform those relying solely on biographical profiles or the model's parametric knowledge. We further reveal a trade-off in how interview data is best utilized: retrieval-augmented methods excel at capturing personality style and response quality, while chronological-based methods better preserve factual consistency and knowledge retention. Our evaluation framework enables principled method selection based on application requirements, and our empirical findings provide actionable insights for advancing personality simulation research.
Problem

Research questions and friction points this paper is trying to address.

personality simulation
interview-grounded evaluation
large language models
factual consistency
authentic personal data
Innovation

Methods, ideas, or system contributions that make the work stand out.

interview-grounded evaluation
personality simulation
retrieval-augmented generation
factual consistency
large language models
🔎 Similar Papers
No similar papers found.
Yu Li
Yu Li
Salesforce Research
Dialog SystemsNatural Language Processing
P
Pranav Narayanan Venkit
Salesforce Research
Yada Pruksachatkun
Yada Pruksachatkun
New York University
C
Chien-Sheng Wu
Salesforce Research