DeepER-Med: Advancing Deep Evidence-Based Research in Medicine Through Agentic AI

📅 2026-04-16
📈 Citations: 0
Influential: 0
📄 PDF

career value

218K/year
🤖 AI Summary
Current medical AI systems lack verifiable, evidence-based evaluation criteria and benchmark datasets reflecting real-world clinical complexity, undermining the trustworthiness and transparency of their outputs. This work proposes DeepER, the first agent-based framework for deep evidence-based reasoning in medicine, which establishes an explicit, auditable evidence-generation pipeline through three core modules: research planning, multi-agent collaboration, and evidence synthesis. The study introduces a verifiable evidence-based workflow into medical AI for the first time, constructs DeepER-MedQA—a curated real-world benchmark—and integrates multi-hop retrieval, evidence-based reasoning, and expert evaluation mechanisms. In human evaluations, DeepER consistently outperforms leading AI systems, generating novel scientific insights; notably, its conclusions align with current clinical guidelines in seven out of eight evaluated cases.

Technology Category

Application Category

📝 Abstract
Trustworthiness and transparency are essential for the clinical adoption of artificial intelligence (AI) in healthcare and biomedical research. Recent deep research systems aim to accelerate evidence-grounded scientific discovery by integrating AI agents with multi-hop information retrieval, reasoning, and synthesis. However, most existing systems lack explicit and inspectable criteria for evidence appraisal, creating a risk of compounding errors and making it difficult for researchers and clinicians to assess the reliability of their outputs. In parallel, current benchmarking approaches rarely evaluate performance on complex, real-world medical questions. Here, we introduce DeepER-Med, a Deep Evidence-based Research framework for Medicine with an agentic AI system. DeepER-Med frames deep medical research as an explicit and inspectable workflow of evidence-based generation, consisting of three modules: research planning, agentic collaboration, and evidence synthesis. To support realistic evaluation, we also present DeepER-MedQA, an evidence-grounded dataset comprising 100 expert-level research questions derived from authentic medical research scenarios and curated by a multidisciplinary panel of 11 biomedical experts. Expert manual evaluation demonstrates that DeepER-Med consistently outperforms widely used production-grade platforms across multiple criteria, including the generation of novel scientific insights. We further demonstrate the practical utility of DeepER-Med through eight real-world clinical cases. Human clinician assessment indicates that DeepER-Med's conclusions align with clinical recommendations in seven cases, highlighting its potential for medical research and decision support.
Problem

Research questions and friction points this paper is trying to address.

evidence appraisal
trustworthiness
transparency
medical AI
benchmarking
Innovation

Methods, ideas, or system contributions that make the work stand out.

Agentic AI
Evidence-Based Medicine
Deep Research Framework
Trustworthy AI
Medical Benchmarking
Zhizheng Wang
Zhizheng Wang
Postdoc, Division of Intramural Research (DIR), NLM, NIH
Large Language ModelsRepresentation LearningGraph Data MiningBioinformatics
C
Chih-Hsuan Wei
Division of Intramural Research (DIR), National Library of Medicine (NLM), National Institutes of Health (NIH), Bethesda, MD 20892, USA
J
Joey Chan
Division of Intramural Research (DIR), National Library of Medicine (NLM), National Institutes of Health (NIH), Bethesda, MD 20892, USA
Robert Leaman
Robert Leaman
Staff Scientist, NCBI/NLM/NIH
Natural Language ProcessingMachine Learning
C
Chi-Ping Day
Cancer Data Science Laboratory, Center for Cancer Research, National Cancer Institute (NCI), National Institutes of Health (NIH), Bethesda, MD 20892, USA
Chuan Wu
Chuan Wu
Professor of Computer Science, The University of Hong Kong
cloud computingdistributed machine learning algorithms and systems
M
Mark A Knepper
Epithelial Systems Biology Laboratory, Systems Biology Center, National Heart, Lung, and Blood Institute (NHLBI), National Institutes of Health (NIH), Bethesda, MD 20892, USA
A
Antolin Serrano Farias
Brain Tumor Genetics Lab, Department of Neurosurgery, Johns Hopkins Hospital, Baltimore, MD 21287, USA
J
Jordina Rincon-Torroella
Brain Tumor Genetics Lab, Department of Neurosurgery, Johns Hopkins Hospital, Baltimore, MD 21287, USA
H
Hasan Slika
Hunterian Neurosurgical Laboratory, Johns Hopkins School of Medicine, Baltimore, MD 21231, USA
B
Betty Tyler
Hunterian Neurosurgical Laboratory, Johns Hopkins School of Medicine, Baltimore, MD 21231, USA
R
Ryan Huu-Tuan Nguyen
Division of Hematology/Oncology, University of Illinois Chicago, Chicago, IL 60612, USA
A
Asmita Indurkar
Division of Epidemiology and Clinical Applications, National Eye Institute (NEI), National Institutes of Health (NIH), Bethesda, MD 20892, USA
M
Mélanie Hébert
Division of Epidemiology and Clinical Applications, National Eye Institute (NEI), National Institutes of Health (NIH), Bethesda, MD 20892, USA
S
Shubo Tian
Division of Intramural Research (DIR), National Library of Medicine (NLM), National Institutes of Health (NIH), Bethesda, MD 20892, USA
L
Lauren He
Division of Intramural Research (DIR), National Library of Medicine (NLM), National Institutes of Health (NIH), Bethesda, MD 20892, USA
N
Noor Naffakh
University of Illinois Cancer Center, Chicago, IL 60612, USA
A
Aseem Aseem
University of Illinois Cancer Center, Chicago, IL 60612, USA
N
Nicholas Wan
University of Michigan Medical School, Ann Arbor, MI 48109, USA
E
Emily Y Chew
Division of Epidemiology and Clinical Applications, National Eye Institute (NEI), National Institutes of Health (NIH), Bethesda, MD 20892, USA
T
Tiarnan D L Keenan
Division of Epidemiology and Clinical Applications, National Eye Institute (NEI), National Institutes of Health (NIH), Bethesda, MD 20892, USA
Zhiyong Lu
Zhiyong Lu
Senior Investigator, NLM; Adjunct Professor of CS, UIUC
BioNLPBiomedical InformaticsMedical AIArtificial Intelligence