Whose Name Comes Up? III: Persona Prompting Effects in LLM-Based Scholar Recommendation

📅 2026-05-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the poorly understood sources of output variability in scholar recommendation by large language models (LLMs), particularly the lack of systematic evaluation of how prompt design—such as identity, language, and geographic cues—affects performance. The authors introduce the first benchmark to audit 43 LLMs across multidimensional prompts (identity, language, geography) and interdisciplinary contexts, quantitatively assessing factual accuracy, coverage, diversity, and fairness in recommending scholars across six academic disciplines. Findings reveal that model choice primarily governs baseline technical quality, while identity and geographic cues in prompts significantly influence diversity and fairness: for instance, prompts referencing South Africa reduce factual accuracy, whereas those referencing Japan enhance it but induce homogenization. This work systematically disentangles model- and prompt-level factors, demonstrating that identity framing is a critical, nontrivial determinant of recommendation quality.
📝 Abstract
Large language models (LLMs) are increasingly used as scholar recommenders, shaping who is seen as an expert in academia. Existing audits remain English-centric, single discipline, and persona-agnostic, leaving the source of output variability poorly understood. To this end, we propose a benchmark that disentangles the effects of model choice and prompt design on recommendations. We audit 43 LLMs by varying persona prompts (language, location, role-and-task) and context (field, seniority, k). Recommended scholars are compared against Semantic Scholar over six scientific disciplines to measure technical quality (factuality, coverage) and social representativeness (diversity, parity). Basic technical quality is driven by model choice, factuality and parity by context, and diversity by location. South Africa prompts yield less factual lists, while Japan prompts yield highly factual but homogeneous lists skewed toward highly productive scholars. Prompt design is thus a non-trivial axis of LLM-based scholar discovery and should be systematically audited alongside model choice.
Problem

Research questions and friction points this paper is trying to address.

scholar recommendation
large language models
persona prompting
algorithmic auditing
representativeness
Innovation

Methods, ideas, or system contributions that make the work stand out.

persona prompting
scholar recommendation
large language models
algorithmic auditing
representational fairness