PulseMind: A Multi-Modal Medical Model for Real-World Clinical Diagnosis

πŸ“… 2026-01-12
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This study addresses the challenge that existing multimodal medical models struggle to effectively handle heterogeneous inputs and contextual understanding required in real-world clinical settings involving multi-turn physician-patient interactions. To bridge this gap, the authors introduce MediScope, a large-scale dataset comprising 98,000 multi-turn clinical consultation records and 601,500 medical images, along with PulseMind Benchmarkβ€”a novel evaluation framework designed to reflect authentic diagnostic workflows. They further propose Comparison-based Reinforcement Policy Optimization (CRPO), a training paradigm that integrates multimodal alignment, multi-turn dialogue modeling, and human preference learning through relative feedback. The approach achieves state-of-the-art performance in both diagnostic accuracy and interaction quality across PulseMind and multiple public medical benchmarks, representing the first systematic integration of large-scale clinical dialogues, a four-dimensional evaluation protocol, and preference-based reinforcement learning in medical AI.

Technology Category

Application Category

πŸ“ Abstract
Recent advances in medical multi-modal models focus on specialized image analysis like dermatology, pathology, or radiology. However, they do not fully capture the complexity of real-world clinical diagnostics, which involve heterogeneous inputs and require ongoing contextual understanding during patient-physician interactions. To bridge this gap, we introduce PulseMind, a new family of multi-modal diagnostic models that integrates a systematically curated dataset, a comprehensive evaluation benchmark, and a tailored training framework. Specifically, we first construct a diagnostic dataset, MediScope, which comprises 98,000 real-world multi-turn consultations and 601,500 medical images, spanning over 10 major clinical departments and more than 200 sub-specialties. Then, to better reflect the requirements of real-world clinical diagnosis, we develop the PulseMind Benchmark, a multi-turn diagnostic consultation benchmark with a four-dimensional evaluation protocol comprising proactiveness, accuracy, usefulness, and language quality. Finally, we design a training framework tailored for multi-modal clinical diagnostics, centered around a core component named Comparison-based Reinforcement Policy Optimization (CRPO). Compared to absolute score rewards, CRPO uses relative preference signals from multi-dimensional com-parisons to provide stable and human-aligned training guidance. Extensive experiments demonstrate that PulseMind achieves competitive performance on both the diagnostic consultation benchmark and public medical benchmarks.
Problem

Research questions and friction points this paper is trying to address.

multi-modal medical model
clinical diagnosis
heterogeneous inputs
contextual understanding
real-world diagnostics
Innovation

Methods, ideas, or system contributions that make the work stand out.

multi-modal medical model
multi-turn diagnostic consultation
Comparison-based Reinforcement Policy Optimization
clinical diagnosis benchmark
real-world medical dataset
πŸ”Ž Similar Papers
No similar papers found.