Self-Training Elicits Concise Reasoning in Large Language Models

📅 2025-02-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address excessive inference overhead caused by redundant tokens in large language models’ (LLMs) chain-of-thought (CoT) reasoning, this paper proposes a self-training paradigm that requires no human annotation and modifies neither model architecture nor inference procedure. Our method—termed *Self-Pruning CoT*—uncovers and activates LLMs’ latent capacity for concise reasoning: it leverages best-of-N sampling to identify high-quality reasoning paths, augments them with few-shot conditional prompting, and applies task-specific fine-tuning—all while harnessing the model’s inherent stochasticity and in-context learning capability to autonomously generate compact rationales. Evaluated on GSM8K and MATH benchmarks across five state-of-the-art LLMs, our approach reduces average output token count by 30% without any degradation in accuracy, thereby substantially improving inference efficiency while preserving original performance.

Technology Category

Application Category

📝 Abstract
Chain-of-thought (CoT) reasoning has enabled large language models (LLMs) to utilize additional computation through intermediate tokens to solve complex tasks. However, we posit that typical reasoning traces contain many redundant tokens, incurring extraneous inference costs. Upon examination of the output distribution of current LLMs, we find evidence on their latent ability to reason more concisely, relative to their default behavior. To elicit this capability, we propose simple fine-tuning methods which leverage self-generated concise reasoning paths obtained by best-of-N sampling and few-shot conditioning, in task-specific settings. Our combined method achieves a 30% reduction in output tokens on average, across five model families on GSM8K and MATH, while maintaining average accuracy. By exploiting the fundamental stochasticity and in-context learning capabilities of LLMs, our self-training approach robustly elicits concise reasoning on a wide range of models, including those with extensive post-training. Code is available at https://github.com/TergelMunkhbat/concise-reasoning
Problem

Research questions and friction points this paper is trying to address.

Reduces redundant tokens in reasoning
Enhances concise reasoning in LLMs
Maintains accuracy with fewer tokens
Innovation

Methods, ideas, or system contributions that make the work stand out.

Self-training fine-tunes concise reasoning
Best-of-N sampling reduces token usage
Few-shot conditioning maintains model accuracy