InterviewSim: A Scalable Framework for Interview-Grounded Personality Simulation

📅 2026-02-23

📈 Citations: 0

✨ Influential: 0

career value

206K/year

🤖 AI Summary

This work addresses the limitation of current large language models in personality simulation, which typically rely on indirect data such as questionnaires or brief interviews rather than direct evidence from individuals’ authentic言行. The authors present the first large-scale evaluation framework grounded in real-world interviews, constructing a dataset of 671,000 question-answer pairs extracted from 23,000 interviews with 1,000 public figures. They introduce a four-dimensional automatic evaluation metric encompassing content similarity, factual consistency, personality alignment, and knowledge retention. Experimental results demonstrate that models trained on real interview data significantly outperform baselines relying solely on biographical texts or parametric knowledge. The study further reveals a trade-off between stylistic expression and factual fidelity, mediated by retrieval augmentation and temporal modeling, thereby offering a scalable and evaluable pathway toward personalized language models.

Technology Category

Application Category

📝 Abstract

Simulating real personalities with large language models requires grounding generation in authentic personal data. Existing evaluation approaches rely on demographic surveys, personality questionnaires, or short AI-led interviews as proxies, but lack direct assessment against what individuals actually said. We address this gap with an interview-grounded evaluation framework for personality simulation at a large scale. We extract over 671,000 question-answer pairs from 23,000 verified interview transcripts across 1,000 public personalities, each with an average of 11.5 hours of interview content. We propose a multi-dimensional evaluation framework with four complementary metrics measuring content similarity, factual consistency, personality alignment, and factual knowledge retention. Through systematic comparison, we demonstrate that methods grounded in real interview data substantially outperform those relying solely on biographical profiles or the model's parametric knowledge. We further reveal a trade-off in how interview data is best utilized: retrieval-augmented methods excel at capturing personality style and response quality, while chronological-based methods better preserve factual consistency and knowledge retention. Our evaluation framework enables principled method selection based on application requirements, and our empirical findings provide actionable insights for advancing personality simulation research.

Problem

Research questions and friction points this paper is trying to address.

personality simulation

interview-grounded evaluation

large language models

factual consistency

authentic personal data

Innovation

Methods, ideas, or system contributions that make the work stand out.

interview-grounded evaluation

personality simulation

retrieval-augmented generation