SpiralFormer: Looped Transformers Can Learn Hierarchical Dependencies via Multi-Resolution Recursion

📅 2026-02-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work proposes SpiralFormer, a novel architecture that addresses the limitations of conventional recurrent Transformers, which operate at a fixed full-sequence resolution and struggle to efficiently model hierarchical dependencies due to constrained parameter and computational efficiency. SpiralFormer introduces, for the first time, a multi-resolution recurrence mechanism that dynamically compresses the input sequence and processes multi-scale representations across different iterative stages, thereby enabling effective hierarchical dependency modeling. By treating sequence resolution as a new dimension within the recurrent architecture, the model achieves cross-scale functional specialization while sharing weights. Experimental results demonstrate that, across model sizes ranging from 160M to 1.4B parameters, SpiralFormer consistently outperforms both recurrent and non-recurrent baselines in terms of parameter and computational efficiency.

Technology Category

Application Category

📝 Abstract
Recursive (looped) Transformers decouple computational depth from parameter depth by repeatedly applying shared layers, providing an explicit architectural primitive for iterative refinement and latent reasoning. However, early looped Transformers often underperform non-recursive baselines of equal compute. While recent literature has introduced more effective recursion mechanisms to mitigate this gap, existing architectures still operate at a fixed, full-token resolution, neglecting the potential efficiency of computing over compressed latent representations. In this paper, we propose SpiralFormer, a looped Transformer that executes recurrence under a multi-resolution recursion schedule. We provide probing evidence that multi-resolution recursion enables the model to learn hierarchical dependencies by inducing iteration-wise functional specialization across different scales. Empirically, SpiralFormer achieves better parameter and compute efficiency than both looped and non-looped baselines across model scales from 160M to 1.4B, establishing sequence resolution as a potential axis for scaling recursive architectures.
Problem

Research questions and friction points this paper is trying to address.

looped Transformers
hierarchical dependencies
multi-resolution recursion
computational efficiency
latent representations
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-resolution Recursion
Looped Transformers
Hierarchical Dependencies
Parameter Efficiency
Latent Representation
🔎 Similar Papers
No similar papers found.
Chengting Yu
Chengting Yu
Zhejiang University
X
Xiaobo Shu
Alibaba Group
Y
Yadao Wang
Alibaba Group
Y
Yizhen Zhang
Alibaba Group
Haoyi Wu
Haoyi Wu
ShanghaiTech University
Y
You Wu
ShanghaiTech University
Rujiao Long
Rujiao Long
Tsinghua University, Alibaba
OCRVLM
Z
Ziheng Chen
Alibaba Group
Y
Yuchi Xu
Alibaba Group
W
Wenbo Su
Alibaba Group
Bo Zheng
Bo Zheng
Researcher, Alibaba Group
AINetworkE-Commerce