Training Transformers as a Universal Computer

πŸ“… 2026-04-27
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF

career value

200K/year
πŸ€– AI Summary
This work investigates whether the standard Transformer architecture can serve as a universal computational engine. The authors propose a method that leverages PENCIL to conduct scaffolded training of small Transformer models within a limited context window, enabling them to incrementally predict the small-step semantics of MicroPy programs. By training on a large corpus of randomly generated, syntactically valid but semantically meaningless programs, the models empirically demonstrate strong out-of-distribution generalization to complex, human-written programs. Notably, they successfully execute tasks such as bitwise operations, binary arithmetic, and SAT verification and solvingβ€”none of which were observed during training. These results provide the first empirical evidence that standard Transformers possess the potential for Turing-complete computation.
πŸ“ Abstract
We demonstrate that a small transformer can learn to execute programs in MicroPy, a simplified yet computationally universal programming language. Given procedure definitions together with an expression to evaluate, the transformer predicts small-step execution using PENCIL scaffolding for space-efficient execution within a bounded context window. After training on randomly generated, meaningless MicroPy programs, the learned transformer generalizes to various human-written programs including bit copying and flipping, binary addition and multiplication, and SAT verification and solving. We note that the trained model can achieve out-of-distribution generalization; i.e., evaluate novel programs from distribution on programs. Since MicroPy can express any computation, our results provide empirical evidence that a standard transformer can be trained to act as a universal computer.
Problem

Research questions and friction points this paper is trying to address.

universal computation
Transformer
program execution
out-of-distribution generalization
computational universality
Innovation

Methods, ideas, or system contributions that make the work stand out.

Transformer
universal computation
program execution
out-of-distribution generalization
PENCIL scaffolding
πŸ”Ž Similar Papers
No similar papers found.