Infi-Med: Low-Resource Medical MLLMs with Robust Reasoning Evaluation

📅 2025-05-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the deployment challenges of medical multimodal large language models (MLLMs)—including high computational resource demands, weak diagnostic robustness, poor clinical adaptability, and privacy compliance risks—this work proposes a low-resource, high-credibility medical MLLM framework. Methodologically, it introduces a novel “triple-integration” paradigm: (1) a minimal yet high-quality supervised fine-tuning (SFT) data construction strategy; (2) a clinical-knowledge-guided cross-modal reasoning enhancement mechanism; and (3) a modular evaluation framework covering diverse modalities and clinical tasks. The framework integrates multimodal alignment modeling with an interpretable reasoning architecture. Experiments demonstrate state-of-the-art performance on general medical reasoning benchmarks, substantial reduction in training cost, rapid clinical domain adaptation, and support for privacy-preserving deployment compliant with healthcare regulations.

Technology Category

Application Category

📝 Abstract
Multimodal large language models (MLLMs) have demonstrated promising prospects in healthcare, particularly for addressing complex medical tasks, supporting multidisciplinary treatment (MDT), and enabling personalized precision medicine. However, their practical deployment faces critical challenges in resource efficiency, diagnostic accuracy, clinical considerations, and ethical privacy. To address these limitations, we propose Infi-Med, a comprehensive framework for medical MLLMs that introduces three key innovations: (1) a resource-efficient approach through curating and constructing high-quality supervised fine-tuning (SFT) datasets with minimal sample requirements, with a forward-looking design that extends to both pretraining and posttraining phases; (2) enhanced multimodal reasoning capabilities for cross-modal integration and clinical task understanding; and (3) a systematic evaluation system that assesses model performance across medical modalities and task types. Our experiments demonstrate that Infi-Med achieves state-of-the-art (SOTA) performance in general medical reasoning while maintaining rapid adaptability to clinical scenarios. The framework establishes a solid foundation for deploying MLLMs in real-world healthcare settings by balancing model effectiveness with operational constraints.
Problem

Research questions and friction points this paper is trying to address.

Improving resource efficiency in medical MLLMs deployment
Enhancing multimodal reasoning for clinical task understanding
Ensuring robust evaluation across medical modalities and tasks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Resource-efficient SFT datasets with minimal samples
Enhanced multimodal reasoning for clinical tasks
Systematic evaluation across medical modalities
🔎 Similar Papers
No similar papers found.
Z
Zeyu Liu
The Hong Kong Polytechnic University
Z
Zhitian Hou
Sun Yat-sen University, Reallm Labs
Y
Yining Di
The Hong Kong University of Science and Technology
K
Kejing Yang
Reallm Labs
Zhijie Sang
Zhijie Sang
Microsoft
NLP
Congkai Xie
Congkai Xie
Reallm Labs
J
Jingwen Yang
The Hong Kong Polytechnic University
S
Siyuan Liu
The Hong Kong Polytechnic University
J
Jialu Wang
Tongji University
C
Chunming Li
Shanghai Jiao Tong University
M
Ming Li
The Hong Kong Polytechnic University
Hongxia Yang
Hongxia Yang
Professor, HK Polytechnic University
Machine LearningGenerative AICognitive IntelligenceStatistical Modeling