🤖 AI Summary
This work addresses the inefficiency of large language models (LLMs) in medical question answering, where unnecessary chain-of-thought (CoT) reasoning is often applied to questions requiring no explicit inference, leading to computational waste and latency. To mitigate this, the authors propose Selective CoT, a strategy that employs a lightweight prediction module to dynamically determine during inference whether a question necessitates explicit reasoning, generating a reasoning chain only when required. The approach requires no architectural modifications, ensuring broad compatibility and ease of deployment. Evaluated on four medical benchmarks—including HeadQA and MedQA-USMLE—across models such as Llama-3.1-8B and Qwen-2.5-7B, Selective CoT reduces inference time by 13–45% and generated tokens by 8–47%, with accuracy degradation of no more than 4%; notably, certain tasks even exhibit simultaneous improvements in both accuracy and efficiency.
📝 Abstract
Objective: To improve the efficiency of medical question answering (MedQA) with large language models (LLMs) by avoiding unnecessary reasoning while maintaining accuracy.
Methods: We propose Selective Chain-of-Thought (Selective CoT), an inference-time strategy that first predicts whether a question requires reasoning and generates a rationale only when needed. Two open-source LLMs (Llama-3.1-8B and Qwen-2.5-7B) were evaluated on four biomedical QA benchmarks-HeadQA, MedQA-USMLE, MedMCQA, and PubMedQA. Metrics included accuracy, total generated tokens, and inference time.
Results: Selective CoT reduced inference time by 13-45% and token usage by 8-47% with minimal accuracy loss ($\leq$4\%). In some model-task pairs, it achieved both higher accuracy and greater efficiency than standard CoT. Compared with fixed-length CoT, Selective CoT reached similar or superior accuracy at substantially lower computational cost.
Discussion: Selective CoT dynamically balances reasoning depth and efficiency by invoking explicit reasoning only when beneficial, reducing redundancy on recall-type questions while preserving interpretability.
Conclusion: Selective CoT provides a simple, model-agnostic, and cost-effective approach for medical QA, aligning reasoning effort with question complexity to enhance real-world deployability of LLM-based clinical systems.