Reasoning with Latent Thoughts: On the Power of Looped Transformers

📅 2025-02-24

📈 Citations: 0

✨ Influential: 0

career value

186K/year

🤖 AI Summary

This work investigates whether the reasoning capability of large language models (LLMs) inherently requires scaling model parameters, or whether computational depth—and thus reasoning capacity—can be efficiently enhanced via recurrent computation (i.e., implicit depth expansion). The authors propose a Recurrent Transformer architecture, wherein a *k*-layer model achieves effective depth *kL* through *L* iterative passes over the same parameters. Evaluated on synthetic reasoning tasks (addition, *p*-hop induction, mathematical reasoning) and real-world language modeling, the approach matches or surpasses the performance of non-recurrent *kL*-layer baselines with significantly fewer parameters. Crucially, it is the first to systematically demonstrate that implicit depth enables stepwise reasoning behavior mirroring chain-of-thought (CoT), exhibiting a positive correlation between iteration count and reasoning accuracy. Key contributions include: (1) establishing *effective depth*, rather than nominal layer count, as the primary determinant of reasoning ability; and (2) introducing a recurrence-based regularization mechanism that jointly improves both reasoning and memory performance.

Technology Category

Application Category

📝 Abstract

Large language models have shown remarkable reasoning abilities and scaling laws suggest that large parameter count, especially along the depth axis, is the primary driver. In this work, we make a stronger claim -- many reasoning problems require a large depth but not necessarily many parameters. This unlocks a novel application of looped models for reasoning. Firstly, we show that for many synthetic reasoning problems like addition, $p$-hop induction, and math problems, a $k$-layer transformer looped $L$ times nearly matches the performance of a $kL$-layer non-looped model, and is significantly better than a $k$-layer model. This is further corroborated by theoretical results showing that many such reasoning problems can be solved via iterative algorithms, and thus, can be solved effectively using looped models with nearly optimal depth. Perhaps surprisingly, these benefits also translate to practical settings of language modeling -- on many downstream reasoning tasks, a language model with $k$-layers looped $L$ times can be competitive to, if not better than, a $kL$-layer language model. In fact, our empirical analysis reveals an intriguing phenomenon: looped and non-looped models exhibit scaling behavior that depends on their effective depth, akin to the inference-time scaling of chain-of-thought (CoT) reasoning. We further elucidate the connection to CoT reasoning by proving that looped models implicitly generate latent thoughts and can simulate $T$ steps of CoT with $T$ loops. Inspired by these findings, we also present an interesting dichotomy between reasoning and memorization, and design a looping-based regularization that is effective on both fronts.

Problem

Research questions and friction points this paper is trying to address.

Looped transformers enhance reasoning depth efficiently.

Looped models simulate chain-of-thought reasoning implicitly.

Looping-based regularization optimizes reasoning and memorization.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Looped transformers enhance reasoning depth

Iterative algorithms optimize looped model performance

Looping simulates latent thoughts effectively

🔎 Similar Papers

A mathematical perspective on Transformers