To Reason or Not to: Selective Chain-of-Thought in Medical Question Answering

📅 2026-02-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the inefficiency of large language models (LLMs) in medical question answering, where unnecessary chain-of-thought (CoT) reasoning is often applied to questions requiring no explicit inference, leading to computational waste and latency. To mitigate this, the authors propose Selective CoT, a strategy that employs a lightweight prediction module to dynamically determine during inference whether a question necessitates explicit reasoning, generating a reasoning chain only when required. The approach requires no architectural modifications, ensuring broad compatibility and ease of deployment. Evaluated on four medical benchmarks—including HeadQA and MedQA-USMLE—across models such as Llama-3.1-8B and Qwen-2.5-7B, Selective CoT reduces inference time by 13–45% and generated tokens by 8–47%, with accuracy degradation of no more than 4%; notably, certain tasks even exhibit simultaneous improvements in both accuracy and efficiency.

Technology Category

Application Category

📝 Abstract
Objective: To improve the efficiency of medical question answering (MedQA) with large language models (LLMs) by avoiding unnecessary reasoning while maintaining accuracy. Methods: We propose Selective Chain-of-Thought (Selective CoT), an inference-time strategy that first predicts whether a question requires reasoning and generates a rationale only when needed. Two open-source LLMs (Llama-3.1-8B and Qwen-2.5-7B) were evaluated on four biomedical QA benchmarks-HeadQA, MedQA-USMLE, MedMCQA, and PubMedQA. Metrics included accuracy, total generated tokens, and inference time. Results: Selective CoT reduced inference time by 13-45% and token usage by 8-47% with minimal accuracy loss ($\leq$4\%). In some model-task pairs, it achieved both higher accuracy and greater efficiency than standard CoT. Compared with fixed-length CoT, Selective CoT reached similar or superior accuracy at substantially lower computational cost. Discussion: Selective CoT dynamically balances reasoning depth and efficiency by invoking explicit reasoning only when beneficial, reducing redundancy on recall-type questions while preserving interpretability. Conclusion: Selective CoT provides a simple, model-agnostic, and cost-effective approach for medical QA, aligning reasoning effort with question complexity to enhance real-world deployability of LLM-based clinical systems.
Problem

Research questions and friction points this paper is trying to address.

Medical Question Answering
Chain-of-Thought
Large Language Models
Inference Efficiency
Reasoning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Selective Chain-of-Thought
Medical Question Answering
Efficiency-Accuracy Trade-off
Inference-time Strategy
Large Language Models
🔎 Similar Papers
No similar papers found.
Zaifu Zhan
Zaifu Zhan
PhD at University of Minnesota, MS at Tsinghua University
Natural language processingMachine LearningAI for BiomedicineLarge Language model
Min Zeng
Min Zeng
School of Computer Science and Engineering, Central South University
BioinformaticsMachine LearningDeep Learning
Shuang Zhou
Shuang Zhou
University of Minnesota, Hong Kong Polytechnic University
Biomedical InformaticsLarge Language ModelsAI for HealthcareElectronic Health Record
Y
Yiran Song
Division of Computational Health Sciences, Department of Surgery, University of Minnesota, 420 Delaware St SE, 55455, Minneapolis, MN, United States
Xiaoyi Chen
Xiaoyi Chen
Indiana University Bloomington
machine learning securitybackdoor
Yu Hou
Yu Hou
University of Minnesota
Y
Yifan Wu
Department of Computer Science and Engineering, University of Minnesota, 200 Union St SE, 55455, Minneapolis, MN, United States
Yang Ruan
Yang Ruan
PhD of Computer Science, Indiana University
Dimension ReductionMapReduce
R
Rui Zhang
Division of Computational Health Sciences, Department of Surgery, University of Minnesota, 420 Delaware St SE, 55455, Minneapolis, MN, United States