MIRIAD: Augmenting LLMs with millions of medical query-response pairs

📅 2025-06-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Large language models (LLMs) in healthcare suffer from hallucination, while existing retrieval-augmented generation (RAG) systems rely heavily on noisy, poorly structured unstructured text. Method: We propose an LLM-optimized paradigm for structured medical knowledge encapsulation. Specifically, we construct MIRIAD—a high-quality, large-scale medical question-answering corpus (5.82 million QA pairs)—exclusively derived from peer-reviewed literature via a semi-automated pipeline integrating LLM generation, rule-based filtering, fact anchoring, and human verification. We introduce verifiable, retrievable QA pairs as a replacement for conventional text chunks and release MIRIAD-Atlas, an interactive, discipline-specific knowledge map. Contribution/Results: On medical QA benchmarks, our approach achieves up to a 6.7% absolute accuracy gain and improves hallucination detection F1 by 22.5–37%. It further enables multi-granularity retrieval and clinical-oriented knowledge exploration.

Technology Category

Application Category

📝 Abstract
LLMs are bound to transform healthcare with advanced decision support and flexible chat assistants. However, LLMs are prone to generate inaccurate medical content. To ground LLMs in high-quality medical knowledge, LLMs have been equipped with external knowledge via RAG, where unstructured medical knowledge is split into small text chunks that can be selectively retrieved and integrated into the LLMs context. Yet, existing RAG pipelines rely on raw, unstructured medical text, which can be noisy, uncurated and difficult for LLMs to effectively leverage. Systematic approaches to organize medical knowledge to best surface it to LLMs are generally lacking. To address these challenges, we introduce MIRIAD, a large-scale, curated corpus of 5,821,948 medical QA pairs, each rephrased from and grounded in a passage from peer-reviewed medical literature using a semi-automated pipeline combining LLM generation, filtering, grounding, and human annotation. Unlike prior medical corpora, which rely on unstructured text, MIRIAD encapsulates web-scale medical knowledge in an operationalized query-response format, which enables more targeted retrieval. Experiments on challenging medical QA benchmarks show that augmenting LLMs with MIRIAD improves accuracy up to 6.7% compared to unstructured RAG baselines with the same source corpus and with the same amount of retrieved text. Moreover, MIRIAD improved the ability of LLMs to detect medical hallucinations by 22.5 to 37% (increase in F1 score). We further introduce MIRIAD-Atlas, an interactive map of MIRIAD spanning 56 medical disciplines, enabling clinical users to visually explore, search, and refine medical knowledge. MIRIAD promises to unlock a wealth of down-stream applications, including medical information retrievers, enhanced RAG applications, and knowledge-grounded chat interfaces, which ultimately enables more reliable LLM applications in healthcare.
Problem

Research questions and friction points this paper is trying to address.

LLMs generate inaccurate medical content without reliable grounding.
Existing RAG pipelines use noisy, unstructured medical text.
Lack of organized medical knowledge for effective LLM utilization.
Innovation

Methods, ideas, or system contributions that make the work stand out.

MIRIAD: 5.8M curated medical QA pairs
Semi-automated pipeline for QA generation
Query-response format enhances retrieval accuracy
🔎 Similar Papers
No similar papers found.
S
Salman Abdullah
Department of Computer Science, Stanford University, Stanford, CA, USA
S
Sam Rawal
Department of Internal Medicine, Mayo Clinic, Phoenix, AZ, USA
C
Cyril Zakka
Hugging Face, New York City, NY , USA
Sophie Ostmeier
Sophie Ostmeier
Stanford University
MLMedicine
M
Maximilian Purk
Hasso-Plattner-Institute for Digital Engineering, University of Potsdam, Potsdam, Germany
E
Eduardo Reis
Center for Artificial Intelligence in Medicine and Imaging, Stanford, CA, USA
E
Eric J. Topol
Scripps Translational Science Institute, San Diego, CA, USA
Jure Leskovec
Jure Leskovec
Professor of Computer Science, Stanford University
Data miningMachine LearningGraph Neural NetworksKnowledge GraphsComplex Networks
Michael Moor
Michael Moor
MD, PhD. Assistant Professor at ETH Zurich. Previously: Stanford, Computer Science.
Medical AIFoundation modelsLLMsAgentsReasoning