🤖 AI Summary
This work addresses the low training efficiency of small recurrent architectures—such as Tiny Recursive Models (TRMs)—on complex reasoning tasks. We propose a lightweight enhancement that recasts implicit reasoning as classifier-free policy optimization and introduces depth-wise supervision at each recurrence step, eliminating conventional halting mechanisms. Our approach integrates implicit policy optimization, layer-wise supervised training, and forward-pass compression. With only 0.8M parameters, it achieves 24% accuracy on ARC-1 while reducing forward computation by 18×; its performance matches that of standard TRMs and substantially surpasses most large language models. The core contribution is the first end-to-end reformulation of TRM reasoning as classifier-free policy learning—decoupling inference from external classification modules—thereby significantly improving both training efficiency and inference compactness.
📝 Abstract
Recently, it was shown that small, looped architectures, such as Tiny Recursive Models (TRMs), can outperform Large Language Models (LLMs) on complex reasoning tasks, including the Abstraction and Reasoning Corpus (ARC). In this work, we investigate a core question: how can we further improve the efficiency of these methods with minimal changes? To address this, we frame the latent reasoning of TRMs as a form of classifier-free guidance and implicit policy improvement algorithm. Building on these insights, we propose a novel training scheme that provides a target for each loop during training. We demonstrate that our approach significantly enhances training efficiency. Our method reduces the total number of forward passes by 18x and eliminates halting mechanisms, while maintaining quality comparable to standard TRMs. Notably, we achieve 24% accuracy on ARC-1 with only 0.8M parameters, outperforming most LLMs.