AdaThink-Med: Medical Adaptive Thinking with Uncertainty-Guided Length Calibration

📅 2025-09-29

📈 Citations: 0

✨ Influential: 0

career value

197K/year

🤖 AI Summary

Medical large language models (LLMs) commonly suffer from fixed inference depth, leading to over-reasoning on simple questions and excessive computational cost. To address this, we propose the first end-to-end adaptive reasoning framework for medical LLMs, featuring an uncertainty-guided, dynamic chain-of-thought (CoT) length calibration mechanism. Our method jointly leverages multi-candidate generation, uncertainty quantification, problem difficulty estimation, and reinforcement learning to automatically suppress or extend reasoning depth. Empirically, the model spontaneously bifurcates into “thinking” and “non-thinking” modes, enabling fine-grained, input-adaptive reasoning control. Evaluated across six medical QA benchmarks, our approach reduces average inference length by 6.4× with negligible performance degradation—achieving a significantly improved efficiency–accuracy trade-off.

Technology Category

Application Category

📝 Abstract

Recent advances in inference time scaling with extended long chain-of thought have significantly improved the reasoning capabilities of both general and medical large language models (LLMs). However, these models tend to engage in lengthy reasoning processes regardless of the difficulty of the input question, leading to increased inference costs in real-world applications. Therefore, enabling adaptive thinking where models think less for simpler questions and think more for complex ones is critical for the effective use of medical LLMs in practice. Despite its importance, there is a lack of end-to-end approaches designed to enhance the adaptive thinking capabilities of medical LLMs while providing a comprehensive examination of the trade-off between performance and computational cost. To bridge this gap, we propose AdaThink-Med, the first end-to-end framework designed to enhance adaptive thinking ability in medical reasoning models with uncertainty-guided length calibration. AdaThink-Med first generates multiple candidate outputs for each question, evaluates the correctness and uncertainty of each candidate, and then estimates problem difficulty via an uncertainty-guided length calibration module. For outputs with low difficulty and correct answers, the framework penalizes longer reasoning paths; whereas for those with high difficulty and incorrect answers, it encourages extending the chain of thought to explore alternative solutions. On six public medical QA benchmarks, AdaThink-Med achieves up to 6.4x length reduction on average while retaining performance with only minimal degradation. Intriguingly, we observe that AdaThink-Med spontaneously develops two distinct reasoning modes, which we characterize as "non-thinking" and "thinking", demonstrating the model's ability to suppress redundant reasoning processes dynamically.

Problem

Research questions and friction points this paper is trying to address.

Medical LLMs lack adaptive thinking for varying question difficulty

Current models use lengthy reasoning regardless of input complexity

No end-to-end framework balances performance and computational cost

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uncertainty-guided length calibration for adaptive thinking

Generates multiple candidates and evaluates correctness uncertainty

Penalizes long reasoning for easy questions extends for hard

🔎 Similar Papers

No similar papers found.