Position: The Turing-Completeness of Real-World Autoregressive Transformers Relies Heavily on Context Management

📅 2026-05-19

📈 Citations: 0

✨ Influential: 0

career value

179K/year

🤖 AI Summary

This work addresses a common conflation in existing literature between the “fixed system” and “scaling family” settings in proofs of Turing completeness for Transformers, which has led to mischaracterizations of the computational power of practical autoregressive models. The paper formally distinguishes these two settings and rigorously formalizes the fixed Transformer system framework. Through theoretical computational complexity analysis and modeling of context management mechanisms, it demonstrates that the expressive computational capacity of deployed Transformer models is critically dependent on their context management strategies. The study establishes that different context handling methods can yield markedly distinct computational capabilities, thereby correcting prevalent misconceptions about Transformer Turing completeness and identifying context management as the pivotal factor governing their computational expressivity.

📝 Abstract

Many works make the eye-catching claim that Transformers are Turing-complete. However, the literature often conflates two distinct settings: (i) a fixed Transformer system setting, in which a fixed autoregressive Transformer is coupled with a fixed context-management method to process inputs of different lengths step by step, and (ii) a scaling-family setting, in which a family of different models (with increasing context-window length or numerical precision) is used to handle different input lengths. Existing proofs of Transformer Turing-completeness are frequently established in setting (ii), whereas real-world LLM deployment and the standard notion of Turing-completeness correspond more naturally to setting (i). In this paper, we first formalize the fixed-system setting, thereby providing a concrete characterization of how real-world LLMs operate. We then argue that results proved in the scaling-family setting provide theoretically meaningful resource bounds but do not establish Turing-completeness, thereby clarifying a common misinterpretation of existing results. Finally, we show that different context-management methods can yield sharply different computational power, and we advocate the position that context management is a central component that critically determines the computational power of real-world autoregressive Transformers.

Problem

Research questions and friction points this paper is trying to address.

Turing-completeness

autoregressive Transformers

context management

fixed-system setting

computational power

Innovation

Methods, ideas, or system contributions that make the work stand out.

Turing-completeness

autoregressive Transformers

context management