🤖 AI Summary
This study investigates whether persona prompting genuinely enhances the response quality of large language models and examines its applicability across diverse question types and domains. By comparing four prompting conditions on a dataset of 1,140 open-ended questions—incorporating embedding-based retrieval, LLM-assisted persona selection, and a hybrid retrieval approach—the authors develop a multidimensional, fine-grained evaluation framework. Their findings reveal that persona prompting is not universally beneficial; instead, it entails a systematic trade-off between domain expertise and response clarity. Its effectiveness is highly dependent on question type: it outperforms baseline methods on advisory tasks in domains such as medicine and psychology, whereas non-persona baselines yield better results for explanatory tasks in finance and law. Moreover, the hybrid retrieval strategy significantly surpasses pure embedding-based methods, underscoring the critical role of appropriate persona alignment.
📝 Abstract
Persona prompting is widely used to steer large language models, yet its practical value remains unclear. Prior work often evaluates persona prompting using aggregate scores, making it difficult to determine whether expert-role prompting consistently improves response quality or instead changes responses along different quality dimensions. We study this question through a controlled comparison of four prompting conditions across 1,140 open-ended questions spanning 38 expert roles and six domains: no role prompt, a generic domain-expert prompt, embedding-based role retrieval, and a hybrid retrieval method combining embedding search with LLM-based role selection. Aggregate results show only small overall differences between conditions. However, metric-level analysis reveals a consistent tradeoff that aggregate averages obscure: role prompting systematically increases expertise depth while reducing clarity. These effects are highly conditional rather than universal. Role prompting performs best on advisory questions and in domains such as medicine and psychology, where structured expert framing and risk communication are intrinsically valuable. In contrast, baseline prompting performs better on conceptual and explanatory questions in finance, legal, science, and technology domains, where concise plain-language explanation is more important. We further show that hybrid retrieval significantly improves over embedding-only role selection, although better role retrieval does not eliminate the broader expertise-depth versus clarity tradeoff. Overall, our findings suggest that persona prompting primarily reshapes response characteristics rather than broadly improving capability, and that multi-metric evaluation is necessary for understanding its effects.