🤖 AI Summary
This work proposes “Opaque Serial Depth” (OSD) as a theoretical upper bound on the capacity of large language models to perform complex reasoning implicitly—without externalized reasoning traces such as chain-of-thought. By integrating Transformer architecture analysis, computational complexity modeling, and automated graph traversal algorithms, we formalize this metric and develop a general-purpose evaluation framework. Empirical results reveal that Mixture-of-Experts (MoE) architectures consistently exhibit lower OSD than dense models, suggesting a greater reliance on explicit reasoning mechanisms. Furthermore, we establish the first numerical upper bound on the OSD for Gemma 3. The accompanying open-source framework offers a novel lens for probing the intrinsic reasoning capabilities of language models.
📝 Abstract
Large language models (LLMs) tend to externalize their reasoning in their chain of thought, making the chain of thought a good target for monitoring. This is partially an inherent feature of the Transformer architecture: sufficiently long serial cognition must pass through the chain of thought (Korbak et al., 2025). We formalize this argument through the notion of opaque serial depth, given by the length of the longest computation that can be done without the use of interpretable intermediate steps like chain of thought. Given this formalization, we compute numeric upper bounds on the opaque serial depth of Gemma 3 models, as well as asymptotic results for additional architectures beyond standard LLMs. We also open-source an automated method that can calculate upper bounds on the opaque serial depth of arbitrary neural networks, and use it to demonstrate that Mixture-of-Experts models likely have lower depth than dense models. Overall, our results suggest that opaque serial depth is a useful tool for understanding the potential for models to do significant reasoning that is not externalized.