Principled Personas: Defining and Measuring the Intended Effects of Persona Prompting on Task Performance

📅 2025-08-27

📈 Citations: 0

✨ Influential: 0

career value

200K/year

🤖 AI Summary

Prior studies report inconsistent effects of role-based prompting (e.g., “math expert”) on large language model (LLM) performance, with unclear boundary conditions and underlying mechanisms. Method: We formalize three ideal properties of role prompts—performance advantage, irrelevance robustness, and attribute fidelity—and construct a quantifiable evaluation framework. Through large-scale controlled experiments and ablation studies across nine mainstream LLMs and 27 diverse tasks, we systematically assess prompt efficacy and sensitivity. Contribution/Results: We find role prompts rarely yield statistically significant gains; models exhibit high sensitivity to irrelevant role details, degrading performance by up to 30%; only the largest models respond positively to robustness-enhancing strategies. This work provides the first empirical evidence of the inherent limitations and scale-dependent nature of role prompting, establishing a theoretical benchmark and practical guidelines for prompt engineering.

Technology Category

Application Category

📝 Abstract

Expert persona prompting -- assigning roles such as expert in math to language models -- is widely used for task improvement. However, prior work shows mixed results on its effectiveness, and does not consider when and why personas should improve performance. We analyze the literature on persona prompting for task improvement and distill three desiderata: 1) performance advantage of expert personas, 2) robustness to irrelevant persona attributes, and 3) fidelity to persona attributes. We then evaluate 9 state-of-the-art LLMs across 27 tasks with respect to these desiderata. We find that expert personas usually lead to positive or non-significant performance changes. Surprisingly, models are highly sensitive to irrelevant persona details, with performance drops of almost 30 percentage points. In terms of fidelity, we find that while higher education, specialization, and domain-relatedness can boost performance, their effects are often inconsistent or negligible across tasks. We propose mitigation strategies to improve robustness -- but find they only work for the largest, most capable models. Our findings underscore the need for more careful persona design and for evaluation schemes that reflect the intended effects of persona usage.

Problem

Research questions and friction points this paper is trying to address.

Evaluating expert persona prompting effectiveness on task performance

Assessing robustness to irrelevant persona attributes in LLMs

Measuring fidelity and consistency of persona attribute effects

Innovation

Methods, ideas, or system contributions that make the work stand out.

Defining three desiderata for persona effectiveness

Evaluating 9 LLMs across 27 tasks systematically

Proposing mitigation strategies for persona robustness

🔎 Similar Papers

Helpful assistant or fruitful facilitator? Investigating how personas affect language model behavior