Mitigating Hallucinations in Healthcare LLMs with Granular Fact-Checking and Domain-Specific Adaptation

📅 2025-12-18
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Medical large language models (LLMs) frequently generate hallucinated outputs, posing significant risks to clinical decision-making safety. Method: This paper proposes a decoupled fact-checking framework featuring an LLM-independent, fine-grained proposition-level verification module that jointly performs numerical consistency checking and discrete logical reasoning, augmented by a domain-specific summarization model fine-tuned on MIMIC-III. Contribution/Results: Departing from end-to-end paradigms, our approach enables EHR-driven, interpretable, and verifiable validation. Using LoRA-based fine-tuning and EHR-aligned modeling, the framework achieves an F1 score of 0.8556; the summarization model attains ROUGE-1 of 0.5797 and BERTScore of 0.9120. It delivers high-precision, trustworthy verification across 3,786 clinical propositions, substantially enhancing output reliability and clinical safety.

Technology Category

Application Category

📝 Abstract
In healthcare, it is essential for any LLM-generated output to be reliable and accurate, particularly in cases involving decision-making and patient safety. However, the outputs are often unreliable in such critical areas due to the risk of hallucinated outputs from the LLMs. To address this issue, we propose a fact-checking module that operates independently of any LLM, along with a domain-specific summarization model designed to minimize hallucination rates. Our model is fine-tuned using Low-Rank Adaptation (LoRa) on the MIMIC III dataset and is paired with the fact-checking module, which uses numerical tests for correctness and logical checks at a granular level through discrete logic in natural language processing (NLP) to validate facts against electronic health records (EHRs). We trained the LLM model on the full MIMIC-III dataset. For evaluation of the fact-checking module, we sampled 104 summaries, extracted them into 3,786 propositions, and used these as facts. The fact-checking module achieves a precision of 0.8904, a recall of 0.8234, and an F1-score of 0.8556. Additionally, the LLM summary model achieves a ROUGE-1 score of 0.5797 and a BERTScore of 0.9120 for summary quality.
Problem

Research questions and friction points this paper is trying to address.

Mitigating hallucinations in healthcare LLMs for reliability
Using granular fact-checking against EHRs to ensure accuracy
Employing domain-specific adaptation to reduce hallucination rates
Innovation

Methods, ideas, or system contributions that make the work stand out.

Independent fact-checking module using numerical and logical tests
Domain-specific summarization model fine-tuned with LoRa on MIMIC-III
Granular validation against electronic health records to reduce hallucinations
🔎 Similar Papers
No similar papers found.
M
Musarrat Zeba
Applied Artificial Intelligence and INtelligent Systems (AAIINS) Laboratory, Dhaka 1217, Bangladesh; Department of Computer Science and Engineering, United International University, Dhaka 1212, Bangladesh
A
Abdullah Al Mamun
Applied Artificial Intelligence and INtelligent Systems (AAIINS) Laboratory, Dhaka 1217, Bangladesh; Department of Computer Science and Engineering, United International University, Dhaka 1212, Bangladesh
K
Kishoar Jahan Tithee
Applied Artificial Intelligence and INtelligent Systems (AAIINS) Laboratory, Dhaka 1217, Bangladesh; Department of Computer Science and Engineering, United International University, Dhaka 1212, Bangladesh
D
Debopom Sutradhar
Applied Artificial Intelligence and INtelligent Systems (AAIINS) Laboratory, Dhaka 1217, Bangladesh; Department of Computer Science and Engineering, United International University, Dhaka 1212, Bangladesh
Mohaimenul Azam Khan Raiaan
Mohaimenul Azam Khan Raiaan
PhD Student, Monash University
Computer VisionExplainable AIArtificial LifeLarge Language Model
Saddam Mukta
Saddam Mukta
Post-doctoral Researcher, LUT University
#LUTsoftwareArtificial IntelligenceMachine LearningNLPSocial Network Mining
R
Reem E. Mohamed
Faculty of Science and Information Technology, Charles Darwin University, Sydney, NSW, Australia
M
Md Rafiqul Islam
Faculty of Science and Technology, Charles Darwin University, Casuarina, NT 0909, Australia
Y
Yakub Sebastian
Faculty of Science and Technology, Charles Darwin University, Casuarina, NT 0909, Australia
M
Mukhtar Hussain
Faculty of Science and Information Technology, Charles Darwin University, Sydney, NSW, Australia
S
Sami Azam
Faculty of Science and Technology, Charles Darwin University, Casuarina, NT 0909, Australia