Hierarchical Reasoning Model

📅 2025-06-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Large language models (LLMs) suffer from fragile task decomposition, high data requirements, and substantial inference latency when relying on chain-of-thought (CoT) reasoning for complex, goal-directed tasks. Method: We propose a bi-level recurrent reasoning architecture inspired by multi-scale neural processing in the human brain: a high-level “slow” module performs abstract planning, while a low-level “fast” module executes fine-grained computations—enabling implicit task decomposition and end-to-end single-step forward inference. Crucially, our approach introduces unsupervised intermediate process modeling, requiring neither CoT annotations nor CoT-specific pretraining. Contribution/Results: With only 27M parameters and 1,000 training samples, our method achieves strong generalization: it outperforms large-scale models on the ARC benchmark, solves 98.7% of complex Sudoku puzzles without any CoT data, attains 99.2% accuracy in discovering optimal paths in large mazes, and exhibits improved training stability and 3.8× lower inference latency.

Technology Category

Application Category

📝 Abstract
Reasoning, the process of devising and executing complex goal-oriented action sequences, remains a critical challenge in AI. Current large language models (LLMs) primarily employ Chain-of-Thought (CoT) techniques, which suffer from brittle task decomposition, extensive data requirements, and high latency. Inspired by the hierarchical and multi-timescale processing in the human brain, we propose the Hierarchical Reasoning Model (HRM), a novel recurrent architecture that attains significant computational depth while maintaining both training stability and efficiency. HRM executes sequential reasoning tasks in a single forward pass without explicit supervision of the intermediate process, through two interdependent recurrent modules: a high-level module responsible for slow, abstract planning, and a low-level module handling rapid, detailed computations. With only 27 million parameters, HRM achieves exceptional performance on complex reasoning tasks using only 1000 training samples. The model operates without pre-training or CoT data, yet achieves nearly perfect performance on challenging tasks including complex Sudoku puzzles and optimal path finding in large mazes. Furthermore, HRM outperforms much larger models with significantly longer context windows on the Abstraction and Reasoning Corpus (ARC), a key benchmark for measuring artificial general intelligence capabilities. These results underscore HRM's potential as a transformative advancement toward universal computation and general-purpose reasoning systems.
Problem

Research questions and friction points this paper is trying to address.

Addresses AI's challenge in complex goal-oriented reasoning
Overcomes limitations of Chain-of-Thought techniques in LLMs
Proposes efficient hierarchical model for universal computation tasks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hierarchical recurrent modules for multi-scale reasoning
Single forward pass without intermediate supervision
Efficient performance with minimal training data
G
Guan Wang
Sapient Intelligence, Singapore
J
Jin Li
Sapient Intelligence, Singapore
Y
Yuhao Sun
Sapient Intelligence, Singapore
X
Xing Chen
Sapient Intelligence, Singapore
C
Changling Liu
Sapient Intelligence, Singapore
Y
Yue Wu
Sapient Intelligence, Singapore
M
Meng Lu
Sapient Intelligence, Singapore
Sen Song
Sen Song
Laboratory of Brain and Intelligence, Dept of Biomedical Engineering, Tsinghua University
Brain-inspired ComputationComputational NeurocienceArtificial General IntelligenceScience of HappinessNeural Circuits
Yasin Abbasi Yadkori
Yasin Abbasi Yadkori
Sapient Intelligence
Artificial IntelligenceMachine LearningReinforcement Learning