Who's Asking? Evaluating LLM Robustness to Inquiry Personas in Factual Question Answering

📅 2025-10-14

📈 Citations: 0

✨ Influential: 0

career value

207K/year

🤖 AI Summary

This study investigates the robustness of large language models (LLMs) in factual question answering under varying inquiry personas—i.e., whether model responses are biased by subjective cues such as user identity, professional background, or belief stance. We introduce the first persona construction method grounded in real human interactions and integrate it into a controlled experimental framework to systematically evaluate consistency in factual responses across major LLMs. Our experiments uncover novel failure modes—including answer refusal, fabricated constraints, and role confusion—triggered by persona cues. Results demonstrate significant fluctuations in factual accuracy across personas, confirming that contextual sensitivity to user identity undermines factual consistency. This work is the first to both reveal and quantify how inquiry personas impair LLM reliability, establishing a new paradigm for robustness evaluation and alignment-oriented optimization.

Technology Category

Application Category

📝 Abstract

Large Language Models (LLMs) should answer factual questions truthfully, grounded in objective knowledge, regardless of user context such as self-disclosed personal information, or system personalization. In this paper, we present the first systematic evaluation of LLM robustness to inquiry personas, i.e. user profiles that convey attributes like identity, expertise, or belief. While prior work has primarily focused on adversarial inputs or distractors for robustness testing, we evaluate plausible, human-centered inquiry persona cues that users disclose in real-world interactions. We find that such cues can meaningfully alter QA accuracy and trigger failure modes such as refusals, hallucinated limitations, and role confusion. These effects highlight how model sensitivity to user framing can compromise factual reliability, and position inquiry persona testing as an effective tool for robustness evaluation.

Problem

Research questions and friction points this paper is trying to address.

Evaluating LLM robustness to user personas in factual question answering

Assessing how identity and expertise cues affect answer accuracy

Testing plausible human-centered personas rather than adversarial inputs

Innovation

Methods, ideas, or system contributions that make the work stand out.

Evaluating LLM robustness to inquiry personas

Testing plausible human-centered persona cues

Measuring accuracy changes and failure modes

🔎 Similar Papers

FacLens: Transferable Probe for Foreseeing Non-Factuality in Large Language Models