🤖 AI Summary
This work investigates the out-of-distribution (OOD) generalization robustness of chain-of-thought (CoT) prompting, focusing on its sensitivity to latent variable reordering and uniform scaling. Within a latent-variable modeling framework, we propose the first quantitative method to characterize the relationship between CoT’s generalization performance and latent-variable similarity. We systematically formalize the OOD robustness boundary of CoT reasoning. Empirical results demonstrate that CoT maintains strong reasoning capability when OOD samples exhibit high latent-variable similarity to the training distribution; however, performance degrades significantly as similarity decreases. Our analysis uncovers the intrinsic mechanisms and fundamental limitations governing CoT generalization—revealing that latent alignment, rather than surface-level pattern matching, underpins its reasoning efficacy. This work provides both theoretical foundations and empirical evidence for developing reliable, distribution-robust reasoning methods.
📝 Abstract
Chain-of-Thought (CoT) prompting has emerged as a powerful technique to improve in-context learning (ICL) in large language models (LLMs) by breaking complex reasoning into intermediate steps. However, the ability of CoT to generalize under distribution shift remains poorly understood. In this work, we extend a latent-variable framework for CoT prompting and study its behavior on two prototypical out-of-distribution (OOD) scenarios: (i) the latent variables for CoT steps are permuted into novel combinations, and (ii) the latent variables uniformly scaled by a factor. Our experiments demonstrate that CoT inference generalizes effectively to OOD samples whose latent variables closely resemble those seen during training, but its performance degrades as this similarity decreases. These findings provide foundational insights into the strengths and limitations of CoT prompting under OOD conditions and suggest directions for developing more resilient reasoning strategies in future LLMs.