Improving Medical Reasoning with Curriculum-Aware Reinforcement Learning

📅 2025-05-25

📈 Citations: 0

✨ Influential: 0

career value

190K/year

🤖 AI Summary

Existing medical reinforcement fine-tuning (RFT) methods are confined to closed-ended visual question answering (VQA), limiting their applicability to open-ended, reasoning-intensive clinical decision-making. To address this, we propose the first curriculum-aware multimodal reinforcement learning framework for medicine—introducing a novel curriculum-driven RFT paradigm that unifies training for both closed- and open-ended medical VQA. Our method integrates rule-verifiable reward modeling, multi-stage curriculum learning, cross-modal alignment fine-tuning, and medical-domain-adaptive reward design, enabling progressive capability advancement—from discriminative recognition to knowledge-grounded reasoning and clinically interpretable outputs. Evaluated across eight medical VQA benchmarks, our approach achieves state-of-the-art performance: +11.4% in-domain accuracy, +5.7% cross-domain generalization, and significantly improved robustness and interpretability in clinical reasoning.

Technology Category

Application Category

📝 Abstract

Recent advances in reinforcement learning with verifiable, rule-based rewards have greatly enhanced the reasoning capabilities and out-of-distribution generalization of VLMs/LLMs, obviating the need for manually crafted reasoning chains. Despite these promising developments in the general domain, their translation to medical imaging remains limited. Current medical reinforcement fine-tuning (RFT) methods predominantly focus on close-ended VQA, thereby restricting the model's ability to engage in world knowledge retrieval and flexible task adaptation. More critically, these methods fall short of addressing the critical clinical demand for open-ended, reasoning-intensive decision-making. To bridge this gap, we introduce extbf{MedCCO}, the first multimodal reinforcement learning framework tailored for medical VQA that unifies close-ended and open-ended data within a curriculum-driven RFT paradigm. Specifically, MedCCO is initially fine-tuned on a diverse set of close-ended medical VQA tasks to establish domain-grounded reasoning capabilities, and is then progressively adapted to open-ended tasks to foster deeper knowledge enhancement and clinical interpretability. We validate MedCCO across eight challenging medical VQA benchmarks, spanning both close-ended and open-ended settings. Experimental results show that MedCCO consistently enhances performance and generalization, achieving a 11.4% accuracy gain across three in-domain tasks, and a 5.7% improvement on five out-of-domain benchmarks. These findings highlight the promise of curriculum-guided RL in advancing robust, clinically-relevant reasoning in medical multimodal language models.

Problem

Research questions and friction points this paper is trying to address.

Enhancing medical reasoning with curriculum-aware reinforcement learning

Bridging the gap in open-ended medical decision-making

Improving generalization in medical VQA tasks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Curriculum-driven reinforcement learning for medical VQA

Unifies close-ended and open-ended medical tasks

Enhances reasoning and generalization in medical models

🔎 Similar Papers

No similar papers found.