Training Transformers as a Universal Computer

📅 2026-04-27

📈 Citations: 0

✨ Influential: 0

career value

178K/year

🤖 AI Summary

This work investigates whether the standard Transformer architecture can serve as a universal computational engine. The authors propose a method that leverages PENCIL to conduct scaffolded training of small Transformer models within a limited context window, enabling them to incrementally predict the small-step semantics of MicroPy programs. By training on a large corpus of randomly generated, syntactically valid but semantically meaningless programs, the models empirically demonstrate strong out-of-distribution generalization to complex, human-written programs. Notably, they successfully execute tasks such as bitwise operations, binary arithmetic, and SAT verification and solving—none of which were observed during training. These results provide the first empirical evidence that standard Transformers possess the potential for Turing-complete computation.

📝 Abstract

We demonstrate that a small transformer can learn to execute programs in MicroPy, a simplified yet computationally universal programming language. Given procedure definitions together with an expression to evaluate, the transformer predicts small-step execution using PENCIL scaffolding for space-efficient execution within a bounded context window. After training on randomly generated, meaningless MicroPy programs, the learned transformer generalizes to various human-written programs including bit copying and flipping, binary addition and multiplication, and SAT verification and solving. We note that the trained model can achieve out-of-distribution generalization; i.e., evaluate novel programs from distribution on programs. Since MicroPy can express any computation, our results provide empirical evidence that a standard transformer can be trained to act as a universal computer.

Problem

Research questions and friction points this paper is trying to address.

universal computation

Transformer

program execution

out-of-distribution generalization

computational universality

Innovation

Methods, ideas, or system contributions that make the work stand out.

Transformer

universal computation

program execution