Inference-Time Rethinking with Latent Thought Vectors for Math Reasoning

📅 2026-02-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Standard chain-of-thought reasoning is irreversible in a single forward pass, making it difficult to correct early errors. This work proposes a "rethinking-at-inference" framework that decouples declarative latent thought vectors from procedural generation, enabling iterative self-correction. It introduces, for the first time, continuous latent thought vectors as an optimizable representation of reasoning structure and incorporates a gradient-based mechanism for optimizing inference strategies. The approach combines Gibbs-style alternating optimization, manifold prior learning, and training small language models from scratch. On GSM8K, a model with only 0.2B parameters, after 30 rethinking iterations, surpasses baselines with 10–15 times more parameters—including 3B-scale models—demonstrating substantial gains in both reasoning efficiency and performance.

Technology Category

Application Category

📝 Abstract
Standard chain-of-thought reasoning generates a solution in a single forward pass, committing irrevocably to each token and lacking a mechanism to recover from early errors. We introduce Inference-Time Rethinking, a generative framework that enables iterative self-correction by decoupling declarative latent thought vectors from procedural generation. We factorize reasoning into a continuous latent thought vector (what to reason about) and a decoder that verbalizes the trace conditioned on this vector (how to reason). Beyond serving as a declarative buffer, latent thought vectors compress the reasoning structure into a continuous representation that abstracts away surface-level token variability, making gradient-based optimization over reasoning strategies well-posed. Our prior model maps unstructured noise to a learned manifold of valid reasoning patterns, and at test time we employ a Gibbs-style procedure that alternates between generating a candidate trace and optimizing the latent vector to better explain that trace, effectively navigating the latent manifold to refine the reasoning strategy. Training a 0.2B-parameter model from scratch on GSM8K, our method with 30 rethinking iterations surpasses baselines with 10 to 15 times more parameters, including a 3B counterpart. This result demonstrates that effective mathematical reasoning can emerge from sophisticated inference-time computation rather than solely from massive parameter counts.
Problem

Research questions and friction points this paper is trying to address.

chain-of-thought reasoning
mathematical reasoning
inference-time correction
early error recovery
self-correction
Innovation

Methods, ideas, or system contributions that make the work stand out.

Inference-Time Rethinking
Latent Thought Vectors
Iterative Self-Correction
Gradient-Based Reasoning Optimization
Mathematical Reasoning
🔎 Similar Papers
No similar papers found.