SoftCoT: Soft Chain-of-Thought for Efficient Reasoning with LLMs

📅 2025-02-17

📈 Citations: 0

✨ Influential: 0

career value

175K/year

🤖 AI Summary

Existing chain-of-thought (CoT) reasoning is constrained by hard decoding over discrete token vocabularies, while continuous-space alternatives often suffer from catastrophic forgetting and typically require fine-tuning the base large language model (LLM), thereby compromising zero-shot capability. To address this, we propose SoftCoT: a lightweight, continuous-space reasoning framework that operates without modifying or fine-tuning the main LLM. SoftCoT introduces soft reasoning tokens—generated by a lightweight assistant model—and maps them into the LLM’s hidden space via a learnable projection module. The inference process is optimized using supervised, parameter-efficient fine-tuning (PEFT). Crucially, SoftCoT fully preserves the original model’s zero-shot performance while training only 0.03% of its parameters. Evaluated on five mainstream reasoning benchmarks, SoftCoT achieves an average accuracy improvement of 4.2%, marking the first approach to enable zero-shot-compatible, optimizable, and fine-tuning-free continuous reasoning in large language models.

Technology Category

Application Category

📝 Abstract

Chain-of-Thought (CoT) reasoning enables Large Language Models (LLMs) to solve complex reasoning tasks by generating intermediate reasoning steps. However, most existing approaches focus on hard token decoding, which constrains reasoning within the discrete vocabulary space and may not always be optimal. While recent efforts explore continuous-space reasoning, they often suffer from catastrophic forgetting, limiting their applicability to state-of-the-art LLMs that already perform well in zero-shot settings with a proper instruction. To address this challenge, we propose a novel approach for continuous-space reasoning that does not require modifying the underlying LLM. Specifically, we employ a lightweight assistant model to generate instance-specific soft thought tokens speculatively as the initial chain of thoughts, which are then mapped into the LLM's representation space via a projection module. Experimental results on five reasoning benchmarks demonstrate that our method enhances LLM reasoning performance through supervised, parameter-efficient fine-tuning.

Problem

Research questions and friction points this paper is trying to address.

Enhances reasoning efficiency in LLMs

Addresses limitations of discrete token decoding

Prevents catastrophic forgetting in continuous-space reasoning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Continuous-space reasoning approach

Lightweight assistant model usage

Parameter-efficient fine-tuning method

🔎 Similar Papers

Semantic Self-Consistency: Enhancing Language Model Reasoning via Semantic Weighting