A Personalized Conversational Benchmark: Towards Simulating Personalized Conversations

📅 2025-05-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the insufficient evaluation of large language models’ (LLMs) personalized reasoning and generation capabilities in multi-turn dialogues. We introduce PersonaConvBench, the first large-scale benchmark integrating explicit persona modeling with multi-turn dialogue structure. It spans ten Reddit domains and supports three core tasks: sentence classification, influence regression, and user-centric text generation. Our contributions are threefold: (1) the first systematic integration of explicit persona representations with dynamic dialogue context; (2) a unified, cross-domain, multi-task, user-centered evaluation framework; and (3) a fine-grained assessment protocol with prompt alignment for both commercial and open-source LLMs. Experiments demonstrate that incorporating personalized dialogue history substantially improves model performance—e.g., sentiment classification accuracy increases by 198% over non-dialogue baselines. The dataset, code, and full experimental results are publicly released.

Technology Category

Application Category

📝 Abstract
We present PersonaConvBench, a large-scale benchmark for evaluating personalized reasoning and generation in multi-turn conversations with large language models (LLMs). Unlike existing work that focuses on either personalization or conversational structure in isolation, PersonaConvBench integrates both, offering three core tasks: sentence classification, impact regression, and user-centric text generation across ten diverse Reddit-based domains. This design enables systematic analysis of how personalized conversational context shapes LLM outputs in realistic multi-user scenarios. We benchmark several commercial and open-source LLMs under a unified prompting setup and observe that incorporating personalized history yields substantial performance improvements, including a 198 percent relative gain over the best non-conversational baseline in sentiment classification. By releasing PersonaConvBench with evaluations and code, we aim to support research on LLMs that adapt to individual styles, track long-term context, and produce contextually rich, engaging responses.
Problem

Research questions and friction points this paper is trying to address.

Evaluating personalized reasoning in multi-turn LLM conversations
Integrating personalization and conversational structure in benchmarks
Analyzing how context shapes LLM outputs in diverse scenarios
Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates personalization and conversational structure tasks
Uses Reddit-based domains for diverse evaluation
Incorporates personalized history for performance improvement
🔎 Similar Papers
No similar papers found.
L
Li Li
University of Southern California
P
Peilin Cai
University of Southern California
Ryan A. Rossi
Ryan A. Rossi
Adobe Research
Machine LearningPersonalizationGraph Representation LearningGraph MLGraph Theory
Franck Dernoncourt
Franck Dernoncourt
NLP/ML Researcher. MIT PhD.
Machine LearningNeural NetworksNatural Language Processing
B
B. Kveton
Adobe Research
Junda Wu
Junda Wu
University of California San Diego
Natural Language ProcessingRecommender SystemMultimodal LearningReinforcement Learning
Tong Yu
Tong Yu
Adobe Research
L
Linxin Song
University of Southern California
Tiankai Yang
Tiankai Yang
University of Southern California
Y
Yuehan Qin
University of Southern California
Nesreen K. Ahmed
Nesreen K. Ahmed
Senior Principal Scientist, Cisco AI Research, Intel Labs, Purdue University
Geometric Deep LearningGraph Representation LearningML for SystemsML4code
Samyadeep Basu
Samyadeep Basu
Research Scientist at Adobe Research | Prev: UMD, MSR
Machine LearningInfluence FunctionsInterpretabilityFew-shot learning
Subhojyoti Mukherjee
Subhojyoti Mukherjee
Adobe Research
Multi-armed BanditsReinforcement LearningLarge Language ModelsRLHF
R
Ruiyi Zhang
Adobe Research
Zhengmian Hu
Zhengmian Hu
Adobe Research
Deep LearningMonte Carlo
Bo Ni
Bo Ni
Vanderbilt University
Machine LearningGraph Machine LearningNatural Language Processing
Y
Yuxiao Zhou
Virginia Polytechnic Institute and State University
Zichao Wang
Zichao Wang
Adobe Research
document AIAI for educationnatural language processingmachine learning
Y
Yue Huang
University of Notre Dame
Y
Yu Wang
University of Oregon
Xiangliang Zhang
Xiangliang Zhang
Leonard C. Bettex Collegiate Professor, Computer Science and Engineering, University of Notre Dame
Machine LearningAI for Science
Philip S. Yu
Philip S. Yu
Professor of Computer Science, University of Illinons at Chicago
Data miningDatabasePrivacy
Xiyang Hu
Xiyang Hu
PhD, Carnegie Mellon University
Machine LearningTrustworthyHuman-AIGenerative AIOut of Distribution
Y
Yue Zhao
University of Southern California