🤖 AI Summary
The performance gains of Universal Transformers (UTs) on complex reasoning tasks—such as ARC-AGI and Sudoku—remain poorly understood, hindering principled architectural design.
Method: This work identifies recurrent inductive bias and strong nonlinearity—not structural complexity—as the primary drivers of UT’s reasoning capability. To operationalize these insights, we propose the Universal Reasoning Model (URM): it incorporates lightweight short convolutions for enhanced local pattern modeling, truncated backpropagation through time (BPTT) for improved training stability and long-range reasoning, and position-aware attention with adaptive depth unfolding.
Contribution/Results: URM achieves 53.8% and 16.0% pass@1 on ARC-AGI 1 and ARC-AGI 2, respectively—setting new state-of-the-art results at the time. Crucially, this is the first work to systematically isolate, analyze, and empirically validate the roles of key inductive biases in UTs. By decoupling architectural components and linking them to reasoning mechanisms, URM establishes a new paradigm for designing efficient, interpretable, and reasoning-capable neural architectures.
📝 Abstract
Universal transformers (UTs) have been widely used for complex reasoning tasks such as ARC-AGI and Sudoku, yet the specific sources of their performance gains remain underexplored. In this work, we systematically analyze UTs variants and show that improvements on ARC-AGI primarily arise from the recurrent inductive bias and strong nonlinear components of Transformer, rather than from elaborate architectural designs. Motivated by this finding, we propose the Universal Reasoning Model (URM), which enhances the UT with short convolution and truncated backpropagation. Our approach substantially improves reasoning performance, achieving state-of-the-art 53.8% pass@1 on ARC-AGI 1 and 16.0% pass@1 on ARC-AGI 2. Our code is avaliable at https://github.com/zitian-gao/URM.