🤖 AI Summary
This work addresses the limited test-time performance of recurrent neural networks on complex reasoning tasks such as Sudoku and maze solving by proposing C-voting, a training-free inference method. C-voting introduces, for the first time, a confidence-based voting mechanism that does not rely on an explicit energy function: it generates multiple candidate solutions via stochastic initialization across diverse trajectories and selects the optimal output through voting weighted by prediction confidence—defined as the mean top-1 probability. Integrated with the iterative attention-based recurrent architecture ItrSA++, C-voting achieves 95.2% accuracy on Sudoku-extreme and 78.6% on maze tasks, substantially outperforming existing approaches like HRM. Notably, it surpasses energy-based voting by 4.9% on Sudoku-hard, demonstrating significant gains in both accuracy and generalization.
📝 Abstract
Neural network models with latent recurrent processing, where identical layers are recursively applied to the latent state, have gained attention as promising models for performing reasoning tasks. A strength of such models is that they enable test-time scaling, where the models can enhance their performance in the test phase without additional training. Models such as the Hierarchical Reasoning Model (HRM) and Artificial Kuramoto Oscillatory Neurons (AKOrN) can facilitate deeper reasoning by increasing the number of recurrent steps, thereby enabling the completion of challenging tasks, including Sudoku, Maze solving, and AGI benchmarks. In this work, we introduce confidence-based voting (C-voting), a test-time scaling strategy designed for recurrent models with multiple latent candidate trajectories. Initializing the latent state with multiple candidates using random variables, C-voting selects the one maximizing the average of top-1 probabilities of the predictions, reflecting the model's confidence. Additionally, it yields 4.9% higher accuracy on Sudoku-hard than the energy-based voting strategy, which is specific to models with explicit energy functions. An essential advantage of C-voting is its applicability: it can be applied to recurrent models without requiring an explicit energy function. Finally, we introduce a simple attention-based recurrent model with randomized initial values named ItrSA++, and demonstrate that when combined with C-voting, it outperforms HRM on Sudoku-extreme (95.2% vs. 55.0%) and Maze (78.6% vs. 74.5%) tasks.