๐ค AI Summary
This study addresses the significant language gap in multilingual medical reasoning, where low-resource languages exhibit substantially weaker performance compared to English, hindering equitable deployment of medical AI. To bridge this gap, the authors propose a language-aware collaborative reasoning framework that simultaneously generates reasoning paths in both English and the local language, abstracts them into structured conceptual representations, and fuses local clinical knowledge with the English logical scaffold at the conceptual level. This approach introduces, for the first time, a language collaboration mechanism leveraging concept-level alignment, structured knowledge integration, and model distillation to markedly enhance both medical reasoning accuracy and cultural appropriateness for low-resource languages. Evaluated on three multilingual medical benchmarks, the method achieves an average performance gain of 5%, with expert assessments confirming its clinical validity and cultural alignment.
๐ Abstract
While reasoning-enhanced large language models perform strongly on English medical tasks, a persistent multilingual gap remains, with substantially weaker reasoning in local languages, limiting equitable global medical deployment. To bridge this gap, we introduce Med-CoReasoner, a language-informed co-reasoning framework that elicits parallel English and local-language reasoning, abstracts them into structured concepts, and integrates local clinical knowledge into an English logical scaffold via concept-level alignment and retrieval. This design combines the structural robustness of English reasoning with the practice-grounded expertise encoded in local languages. To evaluate multilingual medical reasoning beyond multiple-choice settings, we construct MultiMed-X, a benchmark covering seven languages with expert-annotated long-form question answering and natural language inference tasks, comprising 350 instances per language. Experiments across three benchmarks show that Med-CoReasoner improves multilingual reasoning performance by an average of 5%, with particularly substantial gains in low-resource languages. Moreover, model distillation and expert evaluation analysis further confirm that Med-CoReasoner produces clinically sound and culturally grounded reasoning traces.