SpiralFormer: Looped Transformers Can Learn Hierarchical Dependencies via Multi-Resolution Recursion

📅 2026-02-12

📈 Citations: 0

✨ Influential: 0

career value

180K/year

🤖 AI Summary

This work proposes SpiralFormer, a novel architecture that addresses the limitations of conventional recurrent Transformers, which operate at a fixed full-sequence resolution and struggle to efficiently model hierarchical dependencies due to constrained parameter and computational efficiency. SpiralFormer introduces, for the first time, a multi-resolution recurrence mechanism that dynamically compresses the input sequence and processes multi-scale representations across different iterative stages, thereby enabling effective hierarchical dependency modeling. By treating sequence resolution as a new dimension within the recurrent architecture, the model achieves cross-scale functional specialization while sharing weights. Experimental results demonstrate that, across model sizes ranging from 160M to 1.4B parameters, SpiralFormer consistently outperforms both recurrent and non-recurrent baselines in terms of parameter and computational efficiency.

Technology Category

Application Category

📝 Abstract

Recursive (looped) Transformers decouple computational depth from parameter depth by repeatedly applying shared layers, providing an explicit architectural primitive for iterative refinement and latent reasoning. However, early looped Transformers often underperform non-recursive baselines of equal compute. While recent literature has introduced more effective recursion mechanisms to mitigate this gap, existing architectures still operate at a fixed, full-token resolution, neglecting the potential efficiency of computing over compressed latent representations. In this paper, we propose SpiralFormer, a looped Transformer that executes recurrence under a multi-resolution recursion schedule. We provide probing evidence that multi-resolution recursion enables the model to learn hierarchical dependencies by inducing iteration-wise functional specialization across different scales. Empirically, SpiralFormer achieves better parameter and compute efficiency than both looped and non-looped baselines across model scales from 160M to 1.4B, establishing sequence resolution as a potential axis for scaling recursive architectures.

Problem

Research questions and friction points this paper is trying to address.

looped Transformers

hierarchical dependencies

multi-resolution recursion

computational efficiency

latent representations

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-resolution Recursion

Looped Transformers

Hierarchical Dependencies