🤖 AI Summary
Medical RAG faces critical challenges including underutilization of private clinical data, weak multilingual support—particularly for non-English languages—poor adaptability in low-resource settings, and insufficient evaluation of safety and bias. This study presents the first systematic review of RAG architectures, clinical applications (e.g., question answering, report generation, summarization, information extraction), and ethical risks in medicine. It reveals that current practice overrelies on public datasets, English-centric embedding models, and general-purpose LLMs. Methodologically, we propose a novel cross-lingual adaptation framework and a clinician-in-the-loop evaluation paradigm grounded in real-world clinical validation. Our contributions include advocating for domain-specific medical LLMs and implementing multidimensional human–automated hybrid evaluation—explicitly incorporating safety and fairness dimensions. These advances establish both theoretical foundations and actionable pathways toward building trustworthy, accessible, and responsible global medical RAG systems.
📝 Abstract
The rapid growth of medical knowledge and increasing complexity of clinical practice pose challenges. In this context, large language models (LLMs) have demonstrated value; however, inherent limitations remain. Retrieval-augmented generation (RAG) technologies show potential to enhance their clinical applicability. This study reviewed RAG applications in medicine. We found that research primarily relied on publicly available data, with limited application in private data. For retrieval, approaches commonly relied on English-centric embedding models, while LLMs were mostly generic, with limited use of medical-specific LLMs. For evaluation, automated metrics evaluated generation quality and task performance, whereas human evaluation focused on accuracy, completeness, relevance, and fluency, with insufficient attention to bias and safety. RAG applications were concentrated on question answering, report generation, text summarization, and information extraction. Overall, medical RAG remains at an early stage, requiring advances in clinical validation, cross-linguistic adaptation, and support for low-resource settings to enable trustworthy and responsible global use.