Epistemic Fragility in Large Language Models: Prompt Framing Systematically Modulates Misinformation Correction

📅 2025-11-27

📈 Citations: 0

✨ Influential: 0

career value

183K/year

🤖 AI Summary

This work investigates how prompt framing systematically affects large language models’ (LLMs’) ability to correct misinformation, introducing the concept of “cognitive vulnerability”—the susceptibility of correction performance to prompt design factors including openness, user intent, role specification, and complexity. Method: Through controlled experiments, we constructed 320 structured prompts across 10 misinformation domains and collected 2,560 responses from four state-of-the-art models—including Gemini 2.5 Pro and Claude Sonnet 4.5—followed by rigorous coding and statistical analysis. Contribution/Results: We find that creative intent, expert role assignment, and closed-ended framing significantly impair correction efficacy; Gemini 2.5 Pro achieves a 74% higher correction probability than Claude Sonnet 4.5. These findings challenge prevailing safety alignment paradigms, advocating a knowledge-integrity-centered redefinition of prompt robustness and model alignment strategies.

Technology Category

Application Category

📝 Abstract

As large language models (LLMs) rapidly displace traditional expertise, their capacity to correct misinformation has become a core concern. We investigate the idea that prompt framing systematically modulates misinformation correction - something we term 'epistemic fragility'. We manipulated prompts by open-mindedness, user intent, user role, and complexity. Across ten misinformation domains, we generated 320 prompts and elicited 2,560 responses from four frontier LLMs, which were coded for strength of misinformation correction and rectification strategy use. Analyses showed that creative intent, expert role, and closed framing led to a significant reduction in correction likelihood and effectiveness of used strategy. We also found striking model differences: Gemini 2.5 Pro had 74% lower odds of strong correction than Claude Sonnet 4.5. These findings highlight epistemic fragility as an important structural property of LLMs, challenging current guardrails and underscoring the need for alignment strategies that prioritize epistemic integrity over conversational compliance.

Problem

Research questions and friction points this paper is trying to address.

LLMs' misinformation correction varies with prompt framing

Epistemic fragility challenges current AI guardrail effectiveness

Model differences significantly impact misinformation correction strength

Innovation

Methods, ideas, or system contributions that make the work stand out.

Prompt framing modulates misinformation correction

Creative intent reduces correction effectiveness

Model differences affect epistemic fragility

🔎 Similar Papers

ClashEval: Quantifying the tug-of-war between an LLM's internal prior and external evidence