Medical Reasoning in the Era of LLMs: A Systematic Review of Enhancement Techniques and Applications

📅 2025-08-01
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current large language models (LLMs) lack systematic, transparent, and verifiable reasoning capabilities in medical applications, hindering clinical deployment. Method: We systematically review 60 key works from 2022–2025 and propose the first taxonomy of LLM-enhancement techniques for medical reasoning, categorizing methods into training-time (e.g., supervised fine-tuning, reinforcement learning) and test-time (e.g., prompt engineering, multi-agent collaboration) approaches, while supporting multimodal inputs—including text, medical imaging, and code. Contribution/Results: Our analysis identifies core challenges, notably the “fidelity–plausibility gap,” and advocates shifting evaluation paradigms from accuracy-centric metrics toward reasoning quality, process interpretability, and formal verifiability. The framework advances native multimodal medical reasoning modeling and provides a theoretical foundation and practical roadmap for developing clinically trustworthy AI systems.

Technology Category

Application Category

📝 Abstract
The proliferation of Large Language Models (LLMs) in medicine has enabled impressive capabilities, yet a critical gap remains in their ability to perform systematic, transparent, and verifiable reasoning, a cornerstone of clinical practice. This has catalyzed a shift from single-step answer generation to the development of LLMs explicitly designed for medical reasoning. This paper provides the first systematic review of this emerging field. We propose a taxonomy of reasoning enhancement techniques, categorized into training-time strategies (e.g., supervised fine-tuning, reinforcement learning) and test-time mechanisms (e.g., prompt engineering, multi-agent systems). We analyze how these techniques are applied across different data modalities (text, image, code) and in key clinical applications such as diagnosis, education, and treatment planning. Furthermore, we survey the evolution of evaluation benchmarks from simple accuracy metrics to sophisticated assessments of reasoning quality and visual interpretability. Based on an analysis of 60 seminal studies from 2022-2025, we conclude by identifying critical challenges, including the faithfulness-plausibility gap and the need for native multimodal reasoning, and outlining future directions toward building efficient, robust, and sociotechnically responsible medical AI.
Problem

Research questions and friction points this paper is trying to address.

Enhancing LLMs for systematic medical reasoning
Addressing gaps in clinical reasoning transparency
Improving multimodal medical AI applications
Innovation

Methods, ideas, or system contributions that make the work stand out.

Training-time strategies enhance medical reasoning
Test-time mechanisms improve reasoning transparency
Multimodal benchmarks evaluate reasoning quality
🔎 Similar Papers
W
Wenxuan Wang
Renmin University of China
Z
Zizhan Ma
The Chinese University of Hong Kong
Meidan Ding
Meidan Ding
Shenzhen university
computer visionmedical image analysis
S
Shiyi Zheng
Shenzhen University
S
Shengyuan Liu
The Chinese University of Hong Kong
J
Jie Liu
City University of Hong Kong
J
Jiaming Ji
Peking University
W
Wenting Chen
City University of Hong Kong
X
Xiang Li
Massachusetts General Hospital and Harvard Medical School
Linlin Shen
Linlin Shen
Shenzhen University
Deep LearningComputer VisionFacial Analysis/RecognitionMedical Image Analysis
Yixuan Yuan
Yixuan Yuan
Associate Professor in Chinese University of Hong Kong
Medical image analysisAI in healthcareBrain data analysisEndoscopy