🤖 AI Summary
Large language models (LLMs) exhibit suboptimal reasoning quality and efficiency under no-chain-of-thought (no-CoT) inference, where explicit step-by-step reasoning is omitted. Method: This paper proposes the 3TF framework, which first imparts implicit reasoning capability via structured thinking training, then enforces generation of concise, step-free answers through output constraints. It innovatively introduces a short-to-long training paradigm, integrating hybrid-mode training, fine-tuning on CoT-annotated data, and inference-time non-reasoning-mode constraints. Contribution/Results: Experiments demonstrate significant performance gains across multiple mainstream reasoning benchmarks under no-CoT settings. This work provides the first systematic empirical validation of high-quality implicit reasoning—establishing its feasibility and superiority over conventional approaches—and introduces a novel paradigm for efficient, lightweight LLM inference.
📝 Abstract
Recent advances in large language models (LLMs) have leveraged explicit Chain-of-Thought (CoT) prompting to improve reasoning accuracy. However, most existing methods primarily compress verbose reasoning outputs. These Long-to-Short transformations aim to improve efficiency, but still rely on explicit reasoning during inference. In this work, we introduce extbf{3TF} ( extbf{T}hought- extbf{T}raining and extbf{T}hought- extbf{F}ree inference), a framework for efficient reasoning that takes a Short-to-Long perspective. We first train a hybrid model that can operate in both reasoning and non-reasoning modes, and then further train it on CoT-annotated data to internalize structured reasoning, while enforcing concise, thought-free outputs at inference time using the no-reasoning mode. Unlike compression-based approaches, 3TF improves the reasoning quality of non-reasoning outputs, enabling models to perform rich internal reasoning implicitly while keeping external outputs short. Empirically, 3TF-trained models obtain large improvements on reasoning benchmarks under thought-free inference, demonstrating that high quality reasoning can be learned and executed implicitly without explicit step-by-step generation.