MedGPT-oss: Training a General-Purpose Vision-Language Model for Biomedicine

📅 2026-02-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work proposes a lightweight, open-source general-purpose biomedical vision-language model designed for efficient local deployment under strict patient privacy and Protected Health Information (PHI) compliance requirements—addressing the limitations of existing high-performance multimodal systems that are either closed-source or computationally prohibitive. Built upon a GPT-oss language backbone and a vision frontend, the model leverages a three-stage domain-adaptive training strategy, high-quality data curation, and long-context multimodal alignment to achieve strong performance on consumer-grade GPUs. It outperforms larger open-source medical models on both out-of-distribution multimodal reasoning and complex text-only clinical tasks. The authors release the full training recipe, model weights, and evaluation toolkit to support reproducibility and community adoption.

Technology Category

Application Category

📝 Abstract
Biomedical multimodal assistants have the potential to unify radiology, pathology, and clinical-text reasoning, yet a critical deployment gap remains: top-performing systems are either closed-source or computationally prohibitive, precluding the on-premises deployment required for patient privacy and PHI compliance. We introduce MEDGPT-OSS, an open-weight, 20B-parameter generalist vision-language model designed to facilitate open research in clinical AI. Rather than relying on architectural complexity, MEDGPT-OSS pairs the GPT-oss language backbone with a visual front-end via a optimized, three-stage training curriculum. By progressively domain-adapting these modules through rigorous data curation and long-context multimodal alignment, we demonstrate that a 20B model can bridge the capacity gap. It successfully outperforms larger open medical models on out-of-distribution (OOD) multimodal reasoning and complex text-only clinical tasks. By unifying diverse modalities under a single instruction-following interface, MEDGPT-OSS maintains a parameter-efficient footprint fully compatible with commodity GPUs. We release the complete training recipe, open-weight checkpoints, and a rigorous evaluation harness to serve as a verifiable foundation for privacy-preserving, institution-specific clinical AI research.
Problem

Research questions and friction points this paper is trying to address.

biomedical multimodal assistants
closed-source
computational prohibitive
on-premises deployment
PHI compliance
Innovation

Methods, ideas, or system contributions that make the work stand out.

vision-language model
open-weight
multimodal alignment
domain adaptation
clinical AI
🔎 Similar Papers
No similar papers found.
Kai Zhang
Kai Zhang
Lehigh University
AI in HealthcareDeep Learning
Zhengqing Yuan
Zhengqing Yuan
PhD student, University of Notre Dame
NLPDeeplearningCV
C
Cheng Peng
Department of Health Outcomes & Biomedical Informatics, University of Florida
S
Songlin Zhao
Department of Computer Science and Engineering, Lehigh University
M
Mengxian Lyu
Department of Health Outcomes & Biomedical Informatics, University of Florida
Z
Ziyi Chen
Department of Health Outcomes & Biomedical Informatics, University of Florida
Y
Yanfang Ye
Department of Computer Science and Engineering, University of Notre Dame
Wei Liu
Wei Liu
Medical College of Wisconsin
Y
Ying Zhang
Research Computing, University of Florida
K
Kaleb E Smith
AI Technology Center, NVIDIA
Lifang He
Lifang He
Associate Professor of Computer Science, Lehigh University
Machine LearningAI for HealthMedical ImagingBiomedical InformaticsTensor Analysis
L
Lichao Sun
Department of Computer Science and Engineering, Lehigh University
Yonghui Wu
Yonghui Wu
Associate Professor, University of Florida
Natural Language ProcessingMachine LearningMedical InformaticsPharmacovigilance