🤖 AI Summary
Existing medical reinforcement fine-tuning (RFT) methods are confined to closed-ended visual question answering (VQA), limiting their applicability to open-ended, reasoning-intensive clinical decision-making. To address this, we propose the first curriculum-aware multimodal reinforcement learning framework for medicine—introducing a novel curriculum-driven RFT paradigm that unifies training for both closed- and open-ended medical VQA. Our method integrates rule-verifiable reward modeling, multi-stage curriculum learning, cross-modal alignment fine-tuning, and medical-domain-adaptive reward design, enabling progressive capability advancement—from discriminative recognition to knowledge-grounded reasoning and clinically interpretable outputs. Evaluated across eight medical VQA benchmarks, our approach achieves state-of-the-art performance: +11.4% in-domain accuracy, +5.7% cross-domain generalization, and significantly improved robustness and interpretability in clinical reasoning.
📝 Abstract
Recent advances in reinforcement learning with verifiable, rule-based rewards have greatly enhanced the reasoning capabilities and out-of-distribution generalization of VLMs/LLMs, obviating the need for manually crafted reasoning chains. Despite these promising developments in the general domain, their translation to medical imaging remains limited. Current medical reinforcement fine-tuning (RFT) methods predominantly focus on close-ended VQA, thereby restricting the model's ability to engage in world knowledge retrieval and flexible task adaptation. More critically, these methods fall short of addressing the critical clinical demand for open-ended, reasoning-intensive decision-making. To bridge this gap, we introduce extbf{MedCCO}, the first multimodal reinforcement learning framework tailored for medical VQA that unifies close-ended and open-ended data within a curriculum-driven RFT paradigm. Specifically, MedCCO is initially fine-tuned on a diverse set of close-ended medical VQA tasks to establish domain-grounded reasoning capabilities, and is then progressively adapted to open-ended tasks to foster deeper knowledge enhancement and clinical interpretability. We validate MedCCO across eight challenging medical VQA benchmarks, spanning both close-ended and open-ended settings. Experimental results show that MedCCO consistently enhances performance and generalization, achieving a 11.4% accuracy gain across three in-domain tasks, and a 5.7% improvement on five out-of-domain benchmarks. These findings highlight the promise of curriculum-guided RL in advancing robust, clinically-relevant reasoning in medical multimodal language models.