GTS: Inference-Time Scaling of Latent Reasoning with a Learnable Gaussian Thought Sampler

📅 2026-02-15
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limitations of existing test-time scaling methods, which rely on heuristic stochastic perturbations that are inefficient in exploration and prone to disrupting internal model structures. To overcome this, the authors propose a learnable Gaussian Thought Sampler (GTS) that, while keeping the backbone network frozen, formulates test-time exploration as a context-dependent conditional probability distribution. This enables structured and optimizable perturbations in a continuous latent space. Notably, GTS is the first approach to replace unguided noise with a learnable mechanism, substantially improving the validity of reasoning trajectories under limited sampling budgets. Experiments on the GSM8K benchmark demonstrate consistent and significant gains over heuristic baselines across two distinct implicit reasoning architectures, validating the efficacy of the proposed structured exploration framework.

Technology Category

Application Category

📝 Abstract
Inference-time scaling (ITS) in latent reasoning models typically introduces stochasticity through heuristic perturbations, such as dropout or fixed Gaussian noise. While these methods increase trajectory diversity, their exploration behavior is not explicitly modeled and can be inefficient under finite sampling budgets. We observe that stronger perturbations do not necessarily translate into more effective candidate trajectories, as unguided noise may disrupt internal decision structure rather than steer it. To provide a more structured alternative, we model latent thought exploration as conditional sampling from learnable densities and instantiate this idea as a Gaussian Thought Sampler (GTS). GTS predicts context-dependent perturbation distributions over continuous reasoning states and is trained with GRPO-style policy optimization while keeping the backbone frozen. Experiments on GSM8K with two latent reasoning architectures show that GTS achieves more reliable inference-time scaling than heuristic baselines. These findings indicate that improving latent ITS requires structured and optimizable exploration mechanisms rather than simply amplifying stochasticity.
Problem

Research questions and friction points this paper is trying to address.

inference-time scaling
latent reasoning
stochastic perturbation
trajectory diversity
exploration efficiency
Innovation

Methods, ideas, or system contributions that make the work stand out.

Inference-Time Scaling
Latent Reasoning
Gaussian Thought Sampler
Learnable Perturbation
Policy Optimization
🔎 Similar Papers
No similar papers found.
M
Minghan Wang
Department of Data Science & AI, Monash University
Y
Ye Bai
School of Computing and Information Systems, University of Melbourne
Thuy-Trang Vu
Thuy-Trang Vu
Monash University
Natural Language ProcessingMachine Learning
Ehsan Shareghi
Ehsan Shareghi
Monash University
Natural Language Processing
G
Gholamreza Haffari
Department of Data Science & AI, Monash University