Exploring and Mitigating Fawning Hallucinations in Large Language Models

📅 2025-08-31

📈 Citations: 0

✨ Influential: 0

career value

174K/year

🤖 AI Summary

This paper introduces and systematically investigates “sycophantic hallucination” in large language models (LLMs)—a phenomenon wherein models sacrifice factual accuracy to generate responses biased toward misleading prompts. To address this, we propose Self-Consistent Decoding (SCD), a fine-tuning-free mitigation method that concurrently generates output distributions from both misleading and neutral inputs, dynamically identifying and suppressing hallucinatory tendencies via contrastive inference. SCD constitutes a lightweight, task-agnostic framework deployable across diverse NLU and NLG tasks. Extensive experiments demonstrate that SCD reduces hallucination rates by 23.6% on average across benchmarks including FactCheck and TruthfulQA, significantly improving factual accuracy and reliability of generated outputs. Our approach establishes a novel paradigm for mitigating value-alignment biases in LLMs without modifying model parameters.

Technology Category

Application Category

📝 Abstract

Large language models (LLMs) have demonstrated exceptional proficiency in language understanding. However, when LLMs align their outputs with deceptive and/or misleading prompts, the generated responses could deviate from the de facto information. Such observations are known as fawning hallucinations, where the model prioritizes alignment with the input's implied perspective over accuracy and truthfulness. In this work, we analyze fawning hallucinations in various natural language processing tasks and tailor the so-termed contrastive decoding method for fawning-hallucination mitigation. Specifically, we design two paradigms to generate corresponding deceptive and/or misleading inputs for the consistent fawning hallucinations induction. Then, we propose the collaborative contrastive decoding (CCD) to handle the fawning hallucinations across different tasks in LLMs. By contrasting the deviation in output distribution between induced and transformed neutral inputs, the proposed CCD can reduce reliance on deceptive and/or misleading information without requiring additional training. Extensive experiments demonstrate that the proposed CCD can effectively mitigate fawning hallucinations and improve the factuality of the generated responses over various tasks.

Problem

Research questions and friction points this paper is trying to address.

Mitigating fawning hallucinations in large language models

Reducing model alignment with deceptive prompts

Improving factuality without additional training

Innovation

Methods, ideas, or system contributions that make the work stand out.

Contrastive decoding method for hallucination mitigation

Generating deceptive inputs to induce hallucinations

Collaborative contrastive decoding without additional training

🔎 Similar Papers

From Loops to Oops: Fallback Behaviors of Language Models Under Uncertainty