Improving Reliability and Explainability of Medical Question Answering through Atomic Fact Checking in Retrieval-Augmented LLMs

📅 2025-05-30
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the pervasive hallucination and poor traceability of large language models (LLMs) in medical long-context question answering, this paper proposes an atomic fact verification framework: answers are decomposed into minimally verifiable semantic units, each automatically validated against an authoritative medical guideline knowledge base. The method integrates retrieval-augmented generation (RAG), atomic-level fact extraction and structured representation, knowledge-base-driven factual validation, and a multi-expert collaborative evaluation protocol. Its key innovation lies in the first implementation of fine-grained fact decomposition and direct source-document tracing in medical QA, simultaneously enhancing reliability and explainability. Experimental results demonstrate a 40% improvement in overall answer quality, a 50% hallucination detection rate, and precise traceability of every atomic fact to its original literature excerpt.

Technology Category

Application Category

📝 Abstract
Large language models (LLMs) exhibit extensive medical knowledge but are prone to hallucinations and inaccurate citations, which pose a challenge to their clinical adoption and regulatory compliance. Current methods, such as Retrieval Augmented Generation, partially address these issues by grounding answers in source documents, but hallucinations and low fact-level explainability persist. In this work, we introduce a novel atomic fact-checking framework designed to enhance the reliability and explainability of LLMs used in medical long-form question answering. This method decomposes LLM-generated responses into discrete, verifiable units called atomic facts, each of which is independently verified against an authoritative knowledge base of medical guidelines. This approach enables targeted correction of errors and direct tracing to source literature, thereby improving the factual accuracy and explainability of medical Q&A. Extensive evaluation using multi-reader assessments by medical experts and an automated open Q&A benchmark demonstrated significant improvements in factual accuracy and explainability. Our framework achieved up to a 40% overall answer improvement and a 50% hallucination detection rate. The ability to trace each atomic fact back to the most relevant chunks from the database provides a granular, transparent explanation of the generated responses, addressing a major gap in current medical AI applications. This work represents a crucial step towards more trustworthy and reliable clinical applications of LLMs, addressing key prerequisites for clinical application and fostering greater confidence in AI-assisted healthcare.
Problem

Research questions and friction points this paper is trying to address.

Reducing hallucinations in medical LLM responses
Enhancing fact-level explainability in medical Q&A
Improving traceability to authoritative medical sources
Innovation

Methods, ideas, or system contributions that make the work stand out.

Atomic fact-checking framework enhances reliability
Decomposes responses into verifiable medical facts
Traces facts to authoritative medical guidelines
🔎 Similar Papers
No similar papers found.
Juraj Vladika
Juraj Vladika
PhD Student of Computer Science, Technical University of Munich
artificial intelligencemachine learningnatural language processinginformation retrieval
A
Annika Domres
Department of Radiation Oncology, TUM University Hospital Rechts der Isar, TUM School of Medicine and Health, Technical University of Munich, Munich, Germany
M
Mai Nguyen
Department of Radiation Oncology, TUM University Hospital Rechts der Isar, TUM School of Medicine and Health, Technical University of Munich, Munich, Germany
R
Rebecca Moser
Department of Radiation Oncology, TUM University Hospital Rechts der Isar, TUM School of Medicine and Health, Technical University of Munich, Munich, Germany
J
Jana Nano
Department of Radiation Oncology, TUM University Hospital Rechts der Isar, TUM School of Medicine and Health, Technical University of Munich, Munich, Germany
Felix Busch
Felix Busch
Medical Doctor @ Technical University Munich
RadiologyArtificial intelligenceDeep learningLarge Language Models
L
Lisa C. Adams
Department of Diagnostic and Interventional Radiology, TUM University Hospital Rechts der Isar, TUM School of Medicine and Health, Technical University of Munich, Munich, Germany
K
K. Bressem
Department of Diagnostic and Interventional Radiology, TUM University Hospital Rechts der Isar, TUM School of Medicine and Health, Technical University of Munich, Munich, Germany
D
Denise Bernhardt
Department of Radiation Oncology, TUM University Hospital Rechts der Isar, TUM School of Medicine and Health, Technical University of Munich, Munich, Germany
S
Stephanie E. Combs
Department of Radiation Oncology, TUM University Hospital Rechts der Isar, TUM School of Medicine and Health, Technical University of Munich, Munich, Germany
K
K. Borm
Department of Radiation Oncology, TUM University Hospital Rechts der Isar, TUM School of Medicine and Health, Technical University of Munich, Munich, Germany
Florian Matthes
Florian Matthes
Professor of Computer Science, Technische Universität München
Software EngineeringEnterprise ArchitectureNLPLegalTechBlockchain
J
J. Peeken
Department of Radiation Oncology, TUM University Hospital Rechts der Isar, TUM School of Medicine and Health, Technical University of Munich, Munich, Germany