CURE-Med: Curriculum-Informed Reinforcement Learning for Multilingual Medical Reasoning

📅 2026-01-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the poor performance of large language models in medical reasoning tasks for low-resource languages, where maintaining logical correctness and cross-lingual consistency remains challenging. To this end, the authors introduce CUREMED-BENCH, the first high-quality multilingual medical reasoning benchmark covering 13 languages—including low-resource ones such as Amharic and Yoruba—and propose the CURE-MED framework. This framework integrates code-switching-aware supervised fine-tuning, grouped relative policy optimization, and curriculum-guided reinforcement learning to jointly enhance medical logical accuracy and multilingual stability. Experimental results demonstrate that the approach achieves language consistency and logical correctness rates of 85.21%/54.35% on 7B models and 94.96%/70.04% on 32B models, substantially outperforming existing baselines.

Technology Category

Application Category

📝 Abstract
While large language models (LLMs) have shown to perform well on monolingual mathematical and commonsense reasoning, they remain unreliable for multilingual medical reasoning applications, hindering their deployment in multilingual healthcare settings. We address this by first introducing CUREMED-BENCH, a high-quality multilingual medical reasoning dataset with open-ended reasoning queries with a single verifiable answer, spanning thirteen languages, including underrepresented languages such as Amharic, Yoruba, and Swahili. Building on this dataset, we propose CURE-MED, a curriculum-informed reinforcement learning framework that integrates code-switching-aware supervised fine-tuning and Group Relative Policy Optimization to jointly improve logical correctness and language stability. Across thirteen languages, our approach consistently outperforms strong baselines and scales effectively, achieving 85.21% language consistency and 54.35% logical correctness at 7B parameters, and 94.96% language consistency and 70.04% logical correctness at 32B parameters. These results support reliable and equitable multilingual medical reasoning in LLMs. The code and dataset are available at https://cure-med.github.io/
Problem

Research questions and friction points this paper is trying to address.

multilingual medical reasoning
large language models
language consistency
logical correctness
healthcare applications
Innovation

Methods, ideas, or system contributions that make the work stand out.

curriculum-informed reinforcement learning
multilingual medical reasoning
code-switching-aware fine-tuning
Group Relative Policy Optimization
CUREMED-BENCH
🔎 Similar Papers
No similar papers found.
E
Eric Onyame
University of Virginia
A
Akash Ghosh
IIT-Patna
S
Subhadip Baidya
IIT-Patna
S
Sriparna Saha
IIT-Patna
Xiuying Chen
Xiuying Chen
MBZUAI
Trustworthy NLPHuman-Centered NLPComputational Social Science
Chirag Agarwal
Chirag Agarwal
Assistant Professor, UVA
XAITrustworthyMLArtificial Intelligence