Med-REFL: Medical Reasoning Enhancement via Self-Corrected Fine-grained Reflection

📅 2025-06-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In medical reasoning, large language models (LLMs) are constrained by low-quality intermediate reflection steps. This work proposes a fine-grained self-correction reflection mechanism: it decomposes problems via Tree-of-Thought (ToT), quantitatively evaluates each reasoning and reflection step, and automatically constructs high-quality preference data to drive iterative self-refinement—without requiring expert annotations. The method integrates ToT, reflection path generation and scoring, and direct preference optimization (DPO). On MedQA-USMLE, it achieves an average improvement of 4.11%, advances the state-of-the-art (SOTA) for 7B/8B models by 4.13%, and demonstrates strong generalization and robustness across multiple medical QA benchmarks. To our knowledge, this is the first end-to-end, quantifiable, and annotation-free reflection-enhanced paradigm specifically designed for medical AI.

Technology Category

Application Category

📝 Abstract
Large reasoning models have recently made significant strides in mathematical and code reasoning, yet their success has not transferred smoothly to the medical domain. While multiple factors contribute to this disparity, a critical issue is the inadequate focus on the quality of intermediate reflection steps, which is particularly crucial in high-stakes medical scenarios. To address this challenge, we propose Med-REFL, a underline{ extbf{Med}}ical underline{ extbf{R}}easoning underline{ extbf{E}}nhancement via self-corrected underline{ extbf{F}}ine-grained refunderline{ extbf{L}}ection. Our method leverages a tree-of-thought approach to decompose medical questions into fine-grained reasoning paths, quantitatively evaluating each step and its subsequent reflections. These assessments enable automatic construction of direct preference optimization data, reducing reliance on expensive expert annotations while guiding models to identify and correct reasoning errors. Experimental results on the MedQA-USMLE benchmark demonstrate Med-REFL achieves consistent improvements, with average gains up to 4.11%. Notably, it further boosts the state-of-the-art performance of 7B/8B models by an additional 4.13%. Furthermore, Med-REFL exhibits strong generalization capabilities and robustness across several challenging medical question-answering datasets. Our work illustrates that prioritizing reflection quality leads to more accurate and trustworthy reasoning in medical AI applications. Checkpoints, code, and data can be found href{https://github.com/TianYin123/Med-REFL}{here}.
Problem

Research questions and friction points this paper is trying to address.

Enhancing medical reasoning via fine-grained reflection steps
Reducing reliance on expert annotations for error correction
Improving accuracy in high-stakes medical AI applications
Innovation

Methods, ideas, or system contributions that make the work stand out.

Tree-of-thought approach for medical reasoning
Automatic construction of optimization data
Self-corrected fine-grained reflection steps
🔎 Similar Papers
No similar papers found.
Z
Zongxian Yang
City University of Hong Kong (Dongguan)
J
Jiayu Qian
City University of Hong Kong (Dongguan)
Z
Zegao Peng
City University of Hong Kong (Dongguan)
H
Haoyu Zhang
City University of Hong Kong (Dongguan)
Zhi-An Huang
Zhi-An Huang
City University of Hong Kong (Dongguan)
Artificial IntelligenceBioinformaticsMedical Image Analysis