What Makes Chain-of-Thought Work at Probe Time? Local Co-occurrence Rather Than Global Derivation

📅 2026-05-26

📈 Citations: 0

✨ Influential: 0

career value

148K/year

🤖 AI Summary

This study investigates the core mechanism by which chain-of-thought (CoT) prompting enhances the reasoning accuracy of language models. By fixing the content of rationale texts while systematically perturbing their structure—through global shuffling and local window preservation—and employing controlled experiments across diverse models, scales, and tasks, the authors demonstrate that CoT’s performance gains primarily arise from short-range adjacency and lexical activation within sequences of just two to three consecutive tokens, rather than from sentence-level logical or grammatical coherence. They propose a Local Co-occurrence Activation (LCA) mechanism, showing that preserving only local token windows recovers most of the CoT benefit. These findings indicate that CoT effectiveness is driven by low-order linguistic statistical properties rather than high-order reasoning structures.

📝 Abstract

Chain-of-thought (CoT) prompting reliably improves language-model accuracy, but which properties of a rationale text drive the improvement is poorly understood. Prior work has largely studied generation-time behavior. We instead ask a probe-time question: given a fixed rationale in context, what in that text changes the answer? We identify two complementary sources of the gain. First, even a globally word-shuffled rationale substantially outperforms the no-rationale baseline, indicating a strong lexical activation effect. More importantly, the additional gain from structured text appears to arise less from sentence-level logical ordering and more from short-range token adjacency. Preserving contiguous windows of just $n^\star{=}2$--$3$ tokens recovers most of the remaining gain toward full CoT performance. Supporting experiments rule out copying of explicit answer declarations or answer values, as well as full grammatical realization, as primary drivers. Further generalization experiments show that the qualitative pattern remains stable across multiple model families, parameter scales, and datasets. These results support a local co-occurrence activation (LCA) account of probe-time CoT, in which the observed gains appear to arise primarily from lexical activation and short-range token co-occurrence rather than sentence-level logical derivation.

Problem

Research questions and friction points this paper is trying to address.

Chain-of-Thought

probe-time

local co-occurrence

lexical activation

rationale

Innovation

Methods, ideas, or system contributions that make the work stand out.

Chain-of-Thought

local co-occurrence

lexical activation