Deep Improvement Supervision

📅 2025-11-20

📈 Citations: 0

✨ Influential: 0

career value

178K/year

🤖 AI Summary

This work addresses the low training efficiency of small recurrent architectures—such as Tiny Recursive Models (TRMs)—on complex reasoning tasks. We propose a lightweight enhancement that recasts implicit reasoning as classifier-free policy optimization and introduces depth-wise supervision at each recurrence step, eliminating conventional halting mechanisms. Our approach integrates implicit policy optimization, layer-wise supervised training, and forward-pass compression. With only 0.8M parameters, it achieves 24% accuracy on ARC-1 while reducing forward computation by 18×; its performance matches that of standard TRMs and substantially surpasses most large language models. The core contribution is the first end-to-end reformulation of TRM reasoning as classifier-free policy learning—decoupling inference from external classification modules—thereby significantly improving both training efficiency and inference compactness.

Technology Category

Application Category

📝 Abstract

Recently, it was shown that small, looped architectures, such as Tiny Recursive Models (TRMs), can outperform Large Language Models (LLMs) on complex reasoning tasks, including the Abstraction and Reasoning Corpus (ARC). In this work, we investigate a core question: how can we further improve the efficiency of these methods with minimal changes? To address this, we frame the latent reasoning of TRMs as a form of classifier-free guidance and implicit policy improvement algorithm. Building on these insights, we propose a novel training scheme that provides a target for each loop during training. We demonstrate that our approach significantly enhances training efficiency. Our method reduces the total number of forward passes by 18x and eliminates halting mechanisms, while maintaining quality comparable to standard TRMs. Notably, we achieve 24% accuracy on ARC-1 with only 0.8M parameters, outperforming most LLMs.

Problem

Research questions and friction points this paper is trying to address.

Enhancing training efficiency of small looped architectures like TRMs

Reducing computational cost while maintaining reasoning quality

Improving performance on complex reasoning tasks with minimal changes

Innovation

Methods, ideas, or system contributions that make the work stand out.

Training scheme targets each loop iteration

Reduces forward passes by 18x

Eliminates halting mechanisms while maintaining quality

🔎 Similar Papers

No similar papers found.