Inference-Time Rethinking with Latent Thought Vectors for Math Reasoning

📅 2026-02-06

📈 Citations: 0

✨ Influential: 0

career value

171K/year

🤖 AI Summary

Standard chain-of-thought reasoning is irreversible in a single forward pass, making it difficult to correct early errors. This work proposes a "rethinking-at-inference" framework that decouples declarative latent thought vectors from procedural generation, enabling iterative self-correction. It introduces, for the first time, continuous latent thought vectors as an optimizable representation of reasoning structure and incorporates a gradient-based mechanism for optimizing inference strategies. The approach combines Gibbs-style alternating optimization, manifold prior learning, and training small language models from scratch. On GSM8K, a model with only 0.2B parameters, after 30 rethinking iterations, surpasses baselines with 10–15 times more parameters—including 3B-scale models—demonstrating substantial gains in both reasoning efficiency and performance.

Technology Category

Application Category

📝 Abstract

Standard chain-of-thought reasoning generates a solution in a single forward pass, committing irrevocably to each token and lacking a mechanism to recover from early errors. We introduce Inference-Time Rethinking, a generative framework that enables iterative self-correction by decoupling declarative latent thought vectors from procedural generation. We factorize reasoning into a continuous latent thought vector (what to reason about) and a decoder that verbalizes the trace conditioned on this vector (how to reason). Beyond serving as a declarative buffer, latent thought vectors compress the reasoning structure into a continuous representation that abstracts away surface-level token variability, making gradient-based optimization over reasoning strategies well-posed. Our prior model maps unstructured noise to a learned manifold of valid reasoning patterns, and at test time we employ a Gibbs-style procedure that alternates between generating a candidate trace and optimizing the latent vector to better explain that trace, effectively navigating the latent manifold to refine the reasoning strategy. Training a 0.2B-parameter model from scratch on GSM8K, our method with 30 rethinking iterations surpasses baselines with 10 to 15 times more parameters, including a 3B counterpart. This result demonstrates that effective mathematical reasoning can emerge from sophisticated inference-time computation rather than solely from massive parameter counts.

Problem

Research questions and friction points this paper is trying to address.

chain-of-thought reasoning

mathematical reasoning

inference-time correction

early error recovery

self-correction

Innovation

Methods, ideas, or system contributions that make the work stand out.

Inference-Time Rethinking

Latent Thought Vectors

Iterative Self-Correction