RadioRAG: Factual large language models for enhanced diagnostics in radiology using online retrieval augmented generation

📅 2024-07-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Large language models (LLMs) suffer from factual inaccuracies in radiological diagnosis due to outdated knowledge. To address this, we propose an end-to-end, real-time, radiology-specific RAG framework that dynamically crawls authoritative sources (e.g., Radiopaedia), integrates semantic retrieval with zero-shot prompt engineering, and supports plug-and-play integration of multiple LLMs (GPT, Mistral, Llama3). Our key contributions are: (1) overcoming static knowledge base limitations by enabling continuous, real-time clinical knowledge updating; and (2) the first systematic characterization of RAG’s performance gains across radiological subspecialties—revealing significant model heterogeneity, particularly in breast and emergency radiology. Evaluated on the RSNA case dataset and an expert-annotated question bank, our framework achieves up to 54% absolute improvement in diagnostic accuracy over non-RAG baselines, outperforming both non-augmented LLMs and human radiologists—most notably in breast and emergency imaging tasks.

Technology Category

Application Category

📝 Abstract
Large language models (LLMs) often generate outdated or inaccurate information based on static training datasets. Retrieval augmented generation (RAG) mitigates this by integrating outside data sources. While previous RAG systems used pre-assembled, fixed databases with limited flexibility, we have developed Radiology RAG (RadioRAG), an end-to-end framework that retrieves data from authoritative radiologic online sources in real-time. We evaluate the diagnostic accuracy of various LLMs when answering radiology-specific questions with and without access to additional online information via RAG. Using 80 questions from the RSNA Case Collection across radiologic subspecialties and 24 additional expert-curated questions with reference standard answers, LLMs (GPT-3.5-turbo, GPT-4, Mistral-7B, Mixtral-8x7B, and Llama3 [8B and 70B]) were prompted with and without RadioRAG in a zero-shot inference scenario RadioRAG retrieved context-specific information from www.radiopaedia.org in real-time. Accuracy was investigated. Statistical analyses were performed using bootstrapping. The results were further compared with human performance. RadioRAG improved diagnostic accuracy across most LLMs, with relative accuracy increases ranging up to 54% for different LLMs. It matched or exceeded non-RAG models and the human radiologist in question answering across radiologic subspecialties, particularly in breast imaging and emergency radiology. However, the degree of improvement varied among models; GPT-3.5-turbo and Mixtral-8x7B-instruct-v0.1 saw notable gains, while Mistral-7B-instruct-v0.2 showed no improvement, highlighting variability in RadioRAG's effectiveness. LLMs benefit when provided access to domain-specific data beyond their training data. For radiology, RadioRAG establishes a robust framework that substantially improves diagnostic accuracy and factuality in radiological question answering.
Problem

Research questions and friction points this paper is trying to address.

Large Language Models
Radiology Diagnosis
Accuracy Limitations
Innovation

Methods, ideas, or system contributions that make the work stand out.

RadioRAG
Real-time Information Retrieval
Radiology Diagnosis Enhancement
🔎 Similar Papers
No similar papers found.
Soroosh Tayebi Arasteh
Soroosh Tayebi Arasteh
RWTH Aachen University
Deep LearningAI in MedicineGenerative AIMedical Image Analysis
Mahshad Lotfinia
Mahshad Lotfinia
RWTH Aachen University
Artificial IntelligenceDeep LearningMedical Image Analysis
Keno Bressem
Keno Bressem
Technical University Munich
deep learningradiomicsmicrowave ablation
Robert Siepmann
Robert Siepmann
Department of Diagnostic and Interventional Radiology, University Hospital Aachen
D
Dyke Ferber
Else Kroener Fresenius Center for Digital Health, Faculty of Medicine and University Hospital Carl Gustav Carus, TUD Dresden University of Technology, Dresden, Germany; Medical Oncology, National Center for Tumor Diseases (NCT), University Hospital Heidelberg, Heidelberg, Germany
C
Christiane Kuhl
Department of Diagnostic and Interventional Radiology, University Hospital RWTH Aachen, Aachen, Germany
J
Jakob Nikolas Kather
Else Kroener Fresenius Center for Digital Health, Faculty of Medicine and University Hospital Carl Gustav Carus, TUD Dresden University of Technology, Dresden, Germany; Medical Oncology, National Center for Tumor Diseases (NCT), University Hospital Heidelberg, Heidelberg, Germany; Department of Medicine I, Faculty of Medicine and University Hospital Carl Gustav Carus, TUD Dresden University of Technology, Dresden, Germany
Sven Nebelung
Sven Nebelung
Department of Diagnostic and Interventional Radiology, University Hospital Aachen
Advanced MRI TechniquesFunctionality AssessmentBiomechanical ImagingCartilageArtificial Intelligence
Daniel Truhn
Daniel Truhn
Professor of Radiology, University Hospital Aachen
Machine LearningArtificial IntelligenceComputer VisionMedical Imaging