🤖 AI Summary
This work addresses the high computational cost of explicit chain-of-thought (CoT) reasoning in large language models and the limitations of existing implicit methods, which uniformly compress all reasoning steps and often discard critical information. The authors propose Selective Latent Thinking (SLT), a framework that adaptively compresses redundant segments of a reasoning trajectory into latent representations while preserving accuracy-sensitive portions as explicit chains, guided by a confidence-based gating mechanism. SLT is trained via a three-stage strategy integrating a lightweight future-prediction decoder, span-level compression, and trajectory-level reinforcement learning. Evaluated on four mathematical reasoning benchmarks, SLT substantially outperforms prior approaches: it achieves a 22.7% accuracy gain over implicit baselines at the same compression ratio and reduces CoT length by 58.4% with only a 2.8% accuracy drop compared to explicit CoT.
📝 Abstract
Explicit chain-of-thought (CoT) reasoning substantially improves the reasoning ability of large language models (LLMs), but incurs high inference cost due to lengthy autoregressive traces. Existing latent reasoning methods offer a promising alternative, yet they often treat reasoning as uniformly compressible, causing precision-critical intermediate steps to be overly compressed and thereby degrading reasoning accuracy. In this work, we propose Selective Latent Thinking (SLT), a framework that selectively compresses redundant reasoning spans into latent representations while preserving precision-critical spans as explicit CoT within the same reasoning trajectory. Specifically, SLT first uses a lightweight decoder to anticipate a short upcoming reasoning span, and then applies confidence-based gating to determine the longest span that can be reliably compressed. The accepted span is encoded into a compact latent representation to improve reasoning efficiency, while uncertain or precision-critical reasoning remains in explicit CoT form to preserve accuracy. To learn this selective compression policy, SLT adopts a three-stage training strategy that combines span-level latent compression, reliability-aware future reasoning prediction, and trajectory-level reinforcement learning to optimize the trade-off between answer correctness and reasoning cost. Extensive experiments across four mathematical reasoning benchmarks demonstrate that SLT achieves 22.7\% higher accuracy than latent reasoning baselines at comparable compression ratios, while reducing reasoning chain length by 58.4\% with only 2.8\% accuracy degradation compared to explicit CoT,Our code can be found in https://github.com/hunshi34/SLT.