Post Reasoning: Improving the Performance of Non-Thinking Models at No Cost

📅 2026-05-07

📈 Citations: 0

✨ Influential: 0

career value

172K/year

🤖 AI Summary

This work addresses the inefficiency and potential performance degradation caused by explicit chain-of-thought generation in large language models during inference, which often incurs additional latency and computational overhead without consistent benefits. The authors propose a novel “Post-Reasoning” paradigm that first generates an answer and subsequently produces a justification in a retrospective manner, thereby enhancing model performance without increasing inference latency or token consumption. They introduce an instruction-augmented post-reasoning mechanism, a supervised fine-tuning strategy, and a comprehensive evaluation framework spanning multiple models and benchmarks. Experimental results demonstrate that 88.19% of 117 model–benchmark combinations achieve an average relative improvement of 17.37%, and after supervised fine-tuning, 91.11% of configurations show a further average gain of 8.01%, confirming the method’s effectiveness and broad applicability.

📝 Abstract

As the widespread adoption of Large Language Models (LLMs) accelerates, token consumption from intermediate reasoning traces increasingly contributes to inference latency and operational cost. Recent studies suggest that many real-world tasks require little to no explicit reasoning, with additional reasoning sometimes even degrading performance. In this work, we propose \textbf{Post-Reasoning}, a simple yet effective approach that improves instruction-tuned models by conditioning them to justify their answers after generating the final response. By design, it enables the final answer to be obtained without additional latency or token cost, while still improving performance through simple instruction augmentation. We evaluate Post-Reasoning across $117$ model--benchmark settings spanning $13$ open and proprietary models, $4$ model families, and $9$ diverse reasoning and knowledge-intensive benchmarks, including AMC, HMMT, GSM8K, GPQA, MMLU-Pro, and BIG-Bench Hard. Post-Reasoning improves performance in over $88.19\%$ of evaluated settings, achieving a mean relative improvements of $17.37\%$. Furthermore, we propose supervised post-reason tuning, which further improves performance in over $91.11\%$ of evaluated settings, and exceeds the prompt-based post-reasoning baseline by an average of $8.01\%$, demonstrating that post-reasoning can be effectively internalized through training. Ultimately, Post-Reasoning establishes a new performance ceiling for direct-answer capabilities.

Problem

Research questions and friction points this paper is trying to address.

Large Language Models

reasoning efficiency

inference latency

token cost

direct-answer performance

Innovation

Methods, ideas, or system contributions that make the work stand out.

Post-Reasoning

instruction tuning

reasoning efficiency