Learning When to Stop: Adaptive Latent Reasoning via Reinforcement Learning

๐Ÿ“… 2025-11-26
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This work addresses computational redundancy in latent reasoning models caused by fixed inference lengths. We propose a reinforcement learningโ€“based adaptive latent reasoning framework that jointly optimizes inference step count and task accuracy in the post-supervised fine-tuning (SFT) stage. Leveraging knowledge distillation and latent state propagation, our method dynamically adjusts the latent reasoning length of the Llama 3.2 1B model, eliminating reliance on explicit token-by-token autoregressive expansion. Evaluated on GSM8K-Aug, it reduces total inference steps by 52% with zero accuracy degradation, significantly improving model compression ratio and inference efficiency. The core contribution lies in modeling inference length as a learnable policy, enabling fine-grained, task-adaptive allocation of computational resources while preserving semantic completeness.

Technology Category

Application Category

๐Ÿ“ Abstract
Latent reasoning represents a new development in Transformer language models that has shown potential in compressing reasoning lengths compared to chain-of-thought reasoning. By directly passing the information-rich previous final latent state into the next sequence, latent reasoning removes the restriction to human language tokens as the medium for reasoning. We develop adaptive-length latent reasoning models and introduce a post-SFT reinforcement-learning methodology to optimize latent reasoning length by minimizing reasoning length while maintaining accuracy. This, in turn, further reduces compute usage and raises the bar on the compressive capabilities of latent reasoning models. Experiments on the Llama 3.2 1B model and the GSM8K-Aug dataset show a $52%$ drop in total reasoning length with no penalty to accuracy. In future work, we plan to extend to additional models and datasets, analyze relationships between training coefficients, experiment with architecture variations, and continue our knowledge distillation for latent reasoning SFT efforts. We make our code and pretrained weights available at https://github.com/apning/adaptive-latent-reasoning.
Problem

Research questions and friction points this paper is trying to address.

Optimizing latent reasoning length via reinforcement learning
Reducing computational usage while maintaining model accuracy
Developing adaptive-length models for compressed reasoning processes
Innovation

Methods, ideas, or system contributions that make the work stand out.

Adaptive latent reasoning models optimize reasoning length
Reinforcement learning minimizes compute usage while maintaining accuracy
Latent reasoning compresses reasoning steps without human language tokens
๐Ÿ”Ž Similar Papers
No similar papers found.
A
Alex Ning
University of Virginia
Yen-Ling Kuo
Yen-Ling Kuo
University of Virginia
Artificial IntelligenceRoboticsHuman-AI/Robot Interaction
G
Gabe Gomes
Carnegie Mellon University