To Reason or Not to: Selective Chain-of-Thought in Medical Question Answering

📅 2026-02-23

📈 Citations: 0

✨ Influential: 0

career value

184K/year

🤖 AI Summary

This work addresses the inefficiency of large language models (LLMs) in medical question answering, where unnecessary chain-of-thought (CoT) reasoning is often applied to questions requiring no explicit inference, leading to computational waste and latency. To mitigate this, the authors propose Selective CoT, a strategy that employs a lightweight prediction module to dynamically determine during inference whether a question necessitates explicit reasoning, generating a reasoning chain only when required. The approach requires no architectural modifications, ensuring broad compatibility and ease of deployment. Evaluated on four medical benchmarks—including HeadQA and MedQA-USMLE—across models such as Llama-3.1-8B and Qwen-2.5-7B, Selective CoT reduces inference time by 13–45% and generated tokens by 8–47%, with accuracy degradation of no more than 4%; notably, certain tasks even exhibit simultaneous improvements in both accuracy and efficiency.

Technology Category

Application Category

📝 Abstract

Objective: To improve the efficiency of medical question answering (MedQA) with large language models (LLMs) by avoiding unnecessary reasoning while maintaining accuracy. Methods: We propose Selective Chain-of-Thought (Selective CoT), an inference-time strategy that first predicts whether a question requires reasoning and generates a rationale only when needed. Two open-source LLMs (Llama-3.1-8B and Qwen-2.5-7B) were evaluated on four biomedical QA benchmarks-HeadQA, MedQA-USMLE, MedMCQA, and PubMedQA. Metrics included accuracy, total generated tokens, and inference time. Results: Selective CoT reduced inference time by 13-45% and token usage by 8-47% with minimal accuracy loss ($\leq$4\%). In some model-task pairs, it achieved both higher accuracy and greater efficiency than standard CoT. Compared with fixed-length CoT, Selective CoT reached similar or superior accuracy at substantially lower computational cost. Discussion: Selective CoT dynamically balances reasoning depth and efficiency by invoking explicit reasoning only when beneficial, reducing redundancy on recall-type questions while preserving interpretability. Conclusion: Selective CoT provides a simple, model-agnostic, and cost-effective approach for medical QA, aligning reasoning effort with question complexity to enhance real-world deployability of LLM-based clinical systems.

Problem

Research questions and friction points this paper is trying to address.

Medical Question Answering

Chain-of-Thought

Large Language Models

Inference Efficiency

Reasoning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Selective Chain-of-Thought

Medical Question Answering

Efficiency-Accuracy Trade-off