S3-CoT: Self-Sampled Succinct Reasoning Enables Efficient Chain-of-Thought LLMs

📅 2026-02-02

📈 Citations: 0

✨ Influential: 0

career value

180K/year

🤖 AI Summary

This work addresses the inefficiency of large language models (LLMs) in chain-of-thought reasoning, which often stems from redundant steps and lacks the fluency characteristic of human System 1 thinking. To overcome this, the authors propose a self-sampling framework that requires neither external teachers nor human annotations. The approach leverages activation-guided generation to produce style-consistent, variable-length reasoning trajectories directly from the target model itself. These trajectories are then filtered via prediction consistency and used in a curriculum-based supervised fine-tuning process with progressive compression. Notably, the method enables self-evolutionary training even in the absence of ground-truth answers, significantly improving both reasoning efficiency and performance on mathematical benchmarks. Furthermore, it demonstrates strong generalization across domains—such as medicine—and is applicable to both general-purpose and R1-style LLMs.

Technology Category

Application Category

📝 Abstract

Large language models (LLMs) equipped with chain-of-thought (CoT) achieve strong performance and offer a window into LLM behavior. However, recent evidence suggests that improvements in CoT capabilities often come with redundant reasoning processes, motivating a key question: Can LLMs acquire a fast-thinking mode analogous to human System 1 reasoning? To explore this, our study presents a self-sampling framework based on activation steering for efficient CoT learning. Our method can induce style-aligned and variable-length reasoning traces from target LLMs themselves without any teacher guidance, thereby alleviating a central bottleneck of SFT-based methods-the scarcity of high-quality supervision data. Using filtered data by gold answers, we perform SFT for efficient CoT learning with (i) a human-like dual-cognitive system, and (ii) a progressive compression curriculum. Furthermore, we explore a self-evolution regime in which SFT is driven solely by prediction-consistent data of variable-length variants, eliminating the need for gold answers. Extensive experiments on math benchmarks, together with cross-domain generalization tests in medicine, show that our method yields stable improvements for both general and R1-style LLMs. Our data and model checkpoints can be found at https://github.com/DYR1/S3-CoT.

Problem

Research questions and friction points this paper is trying to address.

Chain-of-Thought

reasoning efficiency

redundant reasoning

fast-thinking

large language models

Innovation

Methods, ideas, or system contributions that make the work stand out.

Self-Sampled Reasoning

Activation Steering

Efficient Chain-of-Thought