A Multi-Agent Framework for Medical AI: Leveraging Fine-Tuned GPT, LLaMA, and DeepSeek R1 for Evidence-Based and Bias-Aware Clinical Query Processing

📅 2026-02-15
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenges of weak validation, insufficient evidence, and unreliable confidence in clinical large language models for medical question answering. To enhance reliability and safety, the authors propose a modular multi-agent framework that orchestrates specialized agents for clinical reasoning, evidence retrieval, and answer refinement. The system integrates uncertainty quantification via Monte Carlo Dropout and perplexity estimation, interpretability analysis using LIME and SHAP, and a lexical-sentiment bias detection mechanism. Built upon fine-tuned GPT, LLaMA, and DeepSeek R1 models with PubMed-augmented evidence, the framework achieves an accuracy of 87%, a relevance score of 0.80, and a reduced perplexity of 4.13—all within an end-to-end latency of 36.5 seconds—significantly outperforming the BioGPT baseline.

Technology Category

Application Category

📝 Abstract
Large language models (LLMs) show promise for healthcare question answering, but clinical use is limited by weak verification, insufficient evidence grounding, and unreliable confidence signalling. We propose a multi-agent medical QA framework that combines complementary LLMs with evidence retrieval, uncertainty estimation, and bias checks to improve answer reliability. Our approach has two phases. First, we fine-tune three representative LLM families (GPT, LLaMA, and DeepSeek R1) on MedQuAD-derived medical QA data (20k+ question-answer pairs across multiple NIH domains) and benchmark generation quality. DeepSeek R1 achieves the strongest scores (ROUGE-1 0.536 +- 0.04; ROUGE-2 0.226 +-0.03; BLEU 0.098 -+ 0.018) and substantially outperforms the specialised biomedical baseline BioGPT in zero-shot evaluation. Second, we implement a modular multi-agent pipeline in which a Clinical Reasoning agent (fine-tuned LLaMA) produces structured explanations, an Evidence Retrieval agent queries PubMed to ground responses in recent literature, and a Refinement agent (DeepSeek R1) improves clarity and factual consistency; an optional human validation path is triggered for high-risk or high-uncertainty cases. Safety mechanisms include Monte Carlo dropout and perplexity-based uncertainty scoring, plus lexical and sentiment-based bias detection supported by LIME/SHAP-based analyses. In evaluation, the full system achieves 87% accuracy with relevance around 0.80, and evidence augmentation reduces uncertainty (perplexity 4.13) compared to base responses, with mean end-to-end latency of 36.5 seconds under the reported configuration. Overall, the results indicate that agent specialisation and verification layers can mitigate key single-model limitations and provide a practical, extensible design for evidence-based and bias-aware medical AI.
Problem

Research questions and friction points this paper is trying to address.

medical AI
large language models
evidence grounding
bias awareness
clinical question answering
Innovation

Methods, ideas, or system contributions that make the work stand out.

multi-agent framework
evidence-based medical QA
uncertainty estimation
bias-aware AI
fine-tuned LLMs
🔎 Similar Papers
No similar papers found.
N
Naeimeh Nourmohammadi
Department of Computing and Games, Teesside University, Middlesbrough, United Kingdom
M
Md Meem Hossain
Department of Computing and Games, Teesside University, Middlesbrough, United Kingdom; Centre for Digital Innovation, Teesside University, Middlesbrough, United Kingdom
The Anh Han
The Anh Han
Professor of Computer Science, Teesside University
Evolutionary Game TheoryArtificial IntelligenceEvolution of CooperationMulti-agent Systems
S
Safina Showkat Ara
Faculty of Business & Technology, University of Sunderland, Sunderland, United Kingdom
Z
Zia Ush Shamszaman
Department of Computing and Games, Teesside University, Middlesbrough, United Kingdom; Centre for Digital Innovation, Teesside University, Middlesbrough, United Kingdom