Scaling Latent Reasoning via Looped Language Models

📅 2025-10-29

📈 Citations: 0

✨ Influential: 0

career value

181K/year

🤖 AI Summary

Current large language models (LLMs) rely on post-training explicit textual reasoning—e.g., chain-of-thought (CoT)—and fail to fully harness latent reasoning capabilities acquired during pretraining. This work proposes LoopLM, the first LLM architecture to incorporate recurrent computation into pretraining, enabling iterative reasoning directly in latent space. To optimize computational resource allocation across layers, we introduce an entropy-regularized deep allocation strategy. Building upon this framework, we develop the Ouro model family (1.4B/2.6B parameters), pretrained on 7.7 trillion tokens. Experiments demonstrate that Ouro matches or exceeds state-of-the-art 12B models on diverse reasoning benchmarks while producing more consistent reasoning trajectories. Crucially, gains stem from enhanced internalized knowledge manipulation—not from increased parameter count or knowledge capacity.

Technology Category

Application Category

📝 Abstract

Modern LLMs are trained to "think" primarily via explicit text generation, such as chain-of-thought (CoT), which defers reasoning to post-training and under-leverages pre-training data. We present and open-source Ouro, named after the recursive Ouroboros, a family of pre-trained Looped Language Models (LoopLM) that instead build reasoning into the pre-training phase through (i) iterative computation in latent space, (ii) an entropy-regularized objective for learned depth allocation, and (iii) scaling to 7.7T tokens. Ouro 1.4B and 2.6B models enjoy superior performance that match the results of up to 12B SOTA LLMs across a wide range of benchmarks. Through controlled experiments, we show this advantage stems not from increased knowledge capacity, but from superior knowledge manipulation capabilities. We also show that LoopLM yields reasoning traces more aligned with final outputs than explicit CoT. We hope our results show the potential of LoopLM as a novel scaling direction in the reasoning era. Our model could be found in: http://ouro-llm.github.io.

Problem

Research questions and friction points this paper is trying to address.

Enhancing latent reasoning through iterative computation

Improving knowledge manipulation over capacity expansion

Aligning reasoning traces more closely with outputs

Innovation

Methods, ideas, or system contributions that make the work stand out.

Iterative latent space computation for reasoning

Entropy-regularized objective for depth allocation

Scaled pre-training with 7.7 trillion tokens

🔎 Similar Papers

Do Large Language Models Latently Perform Multi-Hop Reasoning?