🤖 AI Summary
Existing iterative reasoning models struggle to scale computation effectively at test time and achieve generalization beyond memorization. This work proposes the Equilibrium Reasoner (EqR), which, for the first time, formulates scalable reasoning as learning an attractor dynamical system conditioned on the task. By constructing stable fixed points corresponding to valid solutions, EqR enables adaptive computation allocation without relying on external verifiers or task-specific priors. Employing a latent-variable iterative architecture with stochastic trajectory aggregation from multiple initial states and large-scale unrolling, EqR boosts accuracy on Sudoku-Extreme from 2.6% to over 99%, achieving an effective reasoning depth equivalent to 40,000 layers. These results demonstrate a strong correlation between attractor convergence and reasoning performance, supporting both deep and broad computational scaling.
📝 Abstract
Scaling test-time compute by iteratively updating a latent state has emerged as a powerful paradigm for reasoning. Yet the internal mechanisms that enable these iterative models to generalize beyond memorized patterns remain unclear. We hypothesize that generalizable reasoning arises from learning task-conditioned attractors: latent dynamical systems whose stable fixed points correspond to valid solutions.
We formalize this process through Equilibrium Reasoners (EqR), which enable test-time scaling without external verifiers or task-specific priors. EqR scales internal dynamics along two axes: depth, by running more iterations, and breadth, by aggregating stochastic trajectories from multiple initializations. Empirically, gains from test-time scaling are tightly coupled with stronger convergence toward solution-aligned attractors.
This attractor perspective allows neural networks to adaptively allocate test-time compute based on task difficulty. While simple cases converge within 1 to 5 iteration steps, harder cases benefit from massive test-time scaling. By unrolling up to the equivalent of 40,000 layers, scalable latent reasoning boosts accuracy from 2.6% for feedforward models to over 99% on Sudoku-Extreme. These results suggest that learned attractor landscapes provide a useful mechanistic lens for understanding scalable reasoning in iterative latent models.