Knowing Before Saying: LLM Representations Encode Information About Chain-of-Thought Success Before Completion

📅 2025-05-30

📈 Citations: 0

✨ Influential: 0

career value

149K/year

🤖 AI Summary

Can the success or failure of zero-shot chain-of-thought (CoT) reasoning be predicted before generating any tokens? Method: The authors investigate whether discriminative signals for CoT success are already encoded in the initial hidden states of large language models (LLMs). They train a probe classifier on internal model representations—specifically, the pre-softmax logits and early-layer hidden states—to predict CoT outcomes at step zero, i.e., prior to token generation. They further evaluate whether this signal enables controllable early stopping during CoT inference. Contribution/Results: The probe achieves high prediction accuracy—significantly outperforming a BERT-based baseline—demonstrating that LLMs encode generalizable, task-agnostic signals of reasoning success in their earliest representations. Moreover, leveraging this signal for early stopping (after only 1–2 CoT steps) yields performance nearly matching full CoT execution, while substantially surpassing non-CoT baselines. This work is the first to reveal that early LLM representations contain robust, generalizable indicators of reasoning success, establishing a novel paradigm for efficient and controllable reasoning optimization.

Technology Category

Application Category

📝 Abstract

We investigate whether the success of a zero-shot Chain-of-Thought (CoT) process can be predicted before completion. We discover that a probing classifier, based on LLM representations, performs well emph{even before a single token is generated}, suggesting that crucial information about the reasoning process is already present in the initial steps representations. In contrast, a strong BERT-based baseline, which relies solely on the generated tokens, performs worse, likely because it depends on shallow linguistic cues rather than deeper reasoning dynamics. Surprisingly, using later reasoning steps does not always improve classification. When additional context is unhelpful, earlier representations resemble later ones more, suggesting LLMs encode key information early. This implies reasoning can often stop early without loss. To test this, we conduct early stopping experiments, showing that truncating CoT reasoning still improves performance over not using CoT at all, though a gap remains compared to full reasoning. However, approaches like supervised learning or reinforcement learning designed to shorten CoT chains could leverage our classifier's guidance to identify when early stopping is effective. Our findings provide insights that may support such methods, helping to optimize CoT's efficiency while preserving its benefits.footnote{Code and data is available at href{https://github.com/anum94/CoTpred}{ exttt{github.com/anum94/CoTpred}}.

Problem

Research questions and friction points this paper is trying to address.

Predict CoT success before completion using LLM representations

Identify early reasoning steps encoding key information

Optimize CoT efficiency via early stopping guidance

Innovation

Methods, ideas, or system contributions that make the work stand out.

Probing classifier predicts CoT success early

LLM representations encode reasoning dynamics initially

Early stopping maintains CoT benefits efficiently

🔎 Similar Papers

No similar papers found.