Eliciting Medical Reasoning with Knowledge-enhanced Data Synthesis: A Semi-Supervised Reinforcement Learning Approach

📅 2026-04-13

📈 Citations: 0

✨ Influential: 0

career value

166K/year

🤖 AI Summary

High-quality medical reasoning data are scarce, particularly in low-resource scenarios such as rare diseases, severely limiting the complex clinical reasoning capabilities of large language models. To address this challenge, this work proposes MedSSR, a novel framework that integrates knowledge-guided synthetic data generation with semi-supervised reinforcement learning. The approach first leverages medical knowledge to generate reasoning questions with controllable distributions and employs model self-generation to produce high-quality pseudolabels. It then refines the model through a two-stage training process combining self-supervised and supervised reinforcement learning. Notably, MedSSR eliminates the need for costly chain-of-thought distillation and achieves state-of-the-art performance across ten medical benchmarks, with gains of up to 5.93% on rare disease tasks while reducing training costs.

Technology Category

Application Category

📝 Abstract

While large language models hold promise for complex medical applications, their development is hindered by the scarcity of high-quality reasoning data. To address this issue, existing approaches typically distill chain-of-thought reasoning traces from large proprietary models via supervised fine-tuning, then conduct reinforcement learning (RL). These methods exhibit limited improvement on underrepresented domains like rare diseases while incurring substantial costs from generating complex reasoning chains. To efficiently enhance medical reasoning, we propose MedSSR, a Medical Knowledge-enhanced data Synthesis and Semi-supervised Reinforcement learning framework. Our framework first employs rare disease knowledge to synthesize distribution-controllable reasoning questions. We then utilize the policy model itself to generate high-quality pseudo-labels. This enables a two-stage, intrinsic-to-extrinsic training paradigm: self-supervised RL on the pseudo-labeled synthetic data, followed by supervised RL on the human-annotated real data. MedSSR scales model training efficiently without relying on costly trace distillation. Extensive experiments on Qwen and Llama demonstrate that our method outperforms existing methods across ten medical benchmarks, achieving up to +5.93% gain on rare-disease tasks. Our code is available at https://github.com/tdlhl/MedSSR.

Problem

Research questions and friction points this paper is trying to address.

medical reasoning

data scarcity

rare diseases

knowledge-enhanced synthesis

semi-supervised learning

Innovation

Methods, ideas, or system contributions that make the work stand out.

knowledge-enhanced data synthesis

semi-supervised reinforcement learning

medical reasoning