How does Misinformation Affect Large Language Model Behaviors and Preferences?

📅 2025-05-27

📈 Citations: 0

✨ Influential: 0

career value

186K/year

🤖 AI Summary

This study investigates the susceptibility of large language models (LLMs) to misinformation under knowledge conflicts and stylistic variations. To this end, we introduce MisBench—the first million-scale misinformation benchmark—enabling fine-grained, quantitative evaluation of LLM behavioral shifts and knowledge preference changes. We propose the “Reconstruct to Discriminate” (RtD) paradigm: it enhances discriminative capability via semantic reconstruction and improves robustness through adversarial training and preference-decoupled evaluation. Experimental results show that while mainstream LLMs possess baseline fact-checking ability, they remain significantly vulnerable to knowledge conflicts and stylistic perturbations. RtD achieves an average 12.7% improvement in detection accuracy on MisBench and demonstrates substantially enhanced generalization across diverse misinformation scenarios.

Technology Category

Application Category

📝 Abstract

Large Language Models (LLMs) have shown remarkable capabilities in knowledge-intensive tasks, while they remain vulnerable when encountering misinformation. Existing studies have explored the role of LLMs in combating misinformation, but there is still a lack of fine-grained analysis on the specific aspects and extent to which LLMs are influenced by misinformation. To bridge this gap, we present MisBench, the current largest and most comprehensive benchmark for evaluating LLMs' behavior and knowledge preference toward misinformation. MisBench consists of 10,346,712 pieces of misinformation, which uniquely considers both knowledge-based conflicts and stylistic variations in misinformation. Empirical results reveal that while LLMs demonstrate comparable abilities in discerning misinformation, they still remain susceptible to knowledge conflicts and stylistic variations. Based on these findings, we further propose a novel approach called Reconstruct to Discriminate (RtD) to strengthen LLMs' ability to detect misinformation. Our study provides valuable insights into LLMs' interactions with misinformation, and we believe MisBench can serve as an effective benchmark for evaluating LLM-based detectors and enhancing their reliability in real-world applications. Codes and data are available at https://github.com/GKNL/MisBench.

Problem

Research questions and friction points this paper is trying to address.

How misinformation influences LLM behaviors and preferences

Lack of fine-grained analysis on LLM vulnerability to misinformation

Need for comprehensive benchmark to evaluate LLM misinformation detection

Innovation

Methods, ideas, or system contributions that make the work stand out.

Largest benchmark MisBench for LLM misinformation evaluation

Novel Reconstruct to Discriminate (RtD) detection approach

Analyzes knowledge conflicts and stylistic variations impact

🔎 Similar Papers

No similar papers found.