Fact-Controlled Diagnosis of Hallucinations in Medical Text Summarization

📅 2025-05-31

📈 Citations: 0

✨ Influential: 0

career value

182K/year

🤖 AI Summary

Hallucinations in large language models (LLMs) pose serious risks to clinical decision-making in medical dialogue summarization, yet existing research is scarce and general-purpose hallucination detectors perform poorly in clinical settings. Method: We introduce the first dual-track expert-annotated dataset featuring both controlled factual omissions and naturally occurring hallucinations, and propose a fact-controlled hallucination generation paradigm. We design an interpretable fact-counting detection framework integrating factual extraction-and-comparison, fine-tuned and prompt-engineered LLM-based identification, and Leave-N-out for controllable data construction. Contribution/Results: Our method achieves a 27.3% improvement in F1-score for natural hallucination detection. We release the first clinical-hallucination-specific evaluation suite—comprising two benchmark datasets and multi-dimensional metrics—validated and adopted by domain experts.

Technology Category

Application Category

📝 Abstract

Hallucinations in large language models (LLMs) during summarization of patient-clinician dialogues pose significant risks to patient care and clinical decision-making. However, the phenomenon remains understudied in the clinical domain, with uncertainty surrounding the applicability of general-domain hallucination detectors. The rarity and randomness of hallucinations further complicate their investigation. In this paper, we conduct an evaluation of hallucination detection methods in the medical domain, and construct two datasets for the purpose: A fact-controlled Leave-N-out dataset -- generated by systematically removing facts from source dialogues to induce hallucinated content in summaries; and a natural hallucination dataset -- arising organically during LLM-based medical summarization. We show that general-domain detectors struggle to detect clinical hallucinations, and that performance on fact-controlled hallucinations does not reliably predict effectiveness on natural hallucinations. We then develop fact-based approaches that count hallucinations, offering explainability not available with existing methods. Notably, our LLM-based detectors, which we developed using fact-controlled hallucinations, generalize well to detecting real-world clinical hallucinations. This research contributes a suite of specialized metrics supported by expert-annotated datasets to advance faithful clinical summarization systems.

Problem

Research questions and friction points this paper is trying to address.

Detecting hallucinations in medical text summarization by LLMs

Evaluating general-domain hallucination detectors in clinical settings

Developing explainable fact-based methods for clinical hallucination detection

Innovation

Methods, ideas, or system contributions that make the work stand out.

Fact-controlled Leave-N-out dataset generation

Fact-based hallucination counting methods

LLM-based detectors for clinical hallucinations

🔎 Similar Papers

No similar papers found.