🤖 AI Summary
This study investigates the feasibility of integrating the two-step refinement mechanism of Tiny Recursive Models (TRM) into autoregressive architectures and examines the sources of its potential performance gains. Under strictly controlled conditions—maintaining consistent module design, token flow, and prediction objectives—a series of comparative models are constructed, ranging from standard Transformers to fully autoregressive TRMs, and evaluated on character-level algorithmic tasks. For the first time under controlled variables, the TRM mechanism is successfully adapted to an autoregressive framework. While the full autoregressive TRM fails to yield consistent improvements, certain simplified variants of the two-step refinement baseline demonstrate strong performance, revealing that the efficacy of the refinement mechanism can operate independently of complex recursive structures. This finding offers a promising direction toward lightweight and efficient inference architectures.
📝 Abstract
Tiny Recursive Models (TRMs) have recently demonstrated remarkable performance on ARC-AGI, showing that very small models can compete against large foundation models through a two-step refinement mechanism that updates an internal reasoning state $z$ and the predicted output $y$. Naturally, such refinement is of interest for any predictor; it is therefore natural to wonder whether the TRM mechanism could be effectively re-adopted in autoregressive models. However, TRMs cannot be simply compared to standard models because they lack causal predictive structures and contain persistent latent states that make it difficult to isolate specific performance gains. In this paper, we propose the Autoregressive TRM and evaluate it on small autoregressive tasks. To understand its efficacy, we propose a suite of models that gradually transform a standard Transformer to a Tiny Autoregressive Recursive Model in a controlled setting that fixes the block design, token stream, and next-token objective. Across compute-matched experiments on character-level algorithmic tasks, we surprisingly find that there are some two-level refinement baselines that show strong performance. Contrary to expectations, we find no reliable performance gains from the full Autoregressive TRM architecture. These results offer potential promise for two-step refinement mechanisms more broadly but caution against investing in the autoregressive TRM-specific model as a fruitful research direction.