🤖 AI Summary
This work aims to evaluate the causal role of intermediate steps in implicit chain-of-thought (CoT) reasoning on final answer correctness. To this end, we model implicit CoT as a structural causal model (SCM) in representation space and employ do-intervention analysis to assess the causal necessity of latent reasoning steps, trace influence propagation pathways, and examine answer commitment mechanisms. For the first time, we reveal—through the lens of causal intervention—the stage-wise functionality and non-local routing properties inherent in implicit CoT, proposing an analytical framework that integrates modality conditioning with stability awareness. Our experiments uncover that the budget of latent steps is allocated in a stage-wise rather than uniform manner, and that output bias emerges prior to representational commitment, resulting in a persistent gap. These findings establish new objectives for improving the training and decoding of implicit reasoning systems.
📝 Abstract
Latent or continuous chain-of-thought methods replace explicit textual rationales with a number of internal latent steps, but these intermediate computations are difficult to evaluate beyond correlation-based probes. In this paper, we view latent chain-of-thought as a manipulable causal process in representation space by modeling latent steps as variables in a structural causal model (SCM) and analyzing their effects through step-wise $\mathrm{do}$-interventions. We study two representative paradigms (i.e., Coconut and CODI) on both mathematical and general reasoning tasks to investigate three key questions: (1) which steps are causally necessary for correctness and when answers become decidable early; (2) how does influence propagate across steps, and how does this structure compare to explicit CoT; and (3) do intermediate trajectories retain competing answer modes, and how does output-level commitment differ from representational commitment across steps. We find that latent-step budgets behave less like homogeneous extra depth and more like staged functionality with non-local routing, and we identify a persistent gap between early output bias and late representational commitment. These results motivate mode-conditional and stability-aware analyses -- and corresponding training/decoding objectives -- as more reliable tools for interpreting and improving latent reasoning systems.