Retrieval-Augmented Generation in Medicine: A Scoping Review of Technical Implementations, Clinical Applications, and Ethical Considerations

📅 2025-11-08

📈 Citations: 0

✨ Influential: 0

career value

172K/year

🤖 AI Summary

Medical RAG faces critical challenges including underutilization of private clinical data, weak multilingual support—particularly for non-English languages—poor adaptability in low-resource settings, and insufficient evaluation of safety and bias. This study presents the first systematic review of RAG architectures, clinical applications (e.g., question answering, report generation, summarization, information extraction), and ethical risks in medicine. It reveals that current practice overrelies on public datasets, English-centric embedding models, and general-purpose LLMs. Methodologically, we propose a novel cross-lingual adaptation framework and a clinician-in-the-loop evaluation paradigm grounded in real-world clinical validation. Our contributions include advocating for domain-specific medical LLMs and implementing multidimensional human–automated hybrid evaluation—explicitly incorporating safety and fairness dimensions. These advances establish both theoretical foundations and actionable pathways toward building trustworthy, accessible, and responsible global medical RAG systems.

Technology Category

Application Category

📝 Abstract

The rapid growth of medical knowledge and increasing complexity of clinical practice pose challenges. In this context, large language models (LLMs) have demonstrated value; however, inherent limitations remain. Retrieval-augmented generation (RAG) technologies show potential to enhance their clinical applicability. This study reviewed RAG applications in medicine. We found that research primarily relied on publicly available data, with limited application in private data. For retrieval, approaches commonly relied on English-centric embedding models, while LLMs were mostly generic, with limited use of medical-specific LLMs. For evaluation, automated metrics evaluated generation quality and task performance, whereas human evaluation focused on accuracy, completeness, relevance, and fluency, with insufficient attention to bias and safety. RAG applications were concentrated on question answering, report generation, text summarization, and information extraction. Overall, medical RAG remains at an early stage, requiring advances in clinical validation, cross-linguistic adaptation, and support for low-resource settings to enable trustworthy and responsible global use.

Problem

Research questions and friction points this paper is trying to address.

Reviewing RAG technical implementations and clinical applications in medicine

Addressing limitations of large language models for medical knowledge retrieval

Evaluating RAG systems for clinical validation and ethical considerations

Innovation

Methods, ideas, or system contributions that make the work stand out.

Using retrieval-augmented generation for medical applications

Employing English-centric embedding models for information retrieval

Applying automated metrics and human evaluation for assessment

🔎 Similar Papers

No similar papers found.