Efficient Post-Training Refinement of Latent Reasoning in Large Language Models

📅 2025-06-10

📈 Citations: 0

✨ Influential: 0

career value

185K/year

🤖 AI Summary

To address the challenge of inefficiently optimizing implicit reasoning trajectories during post-training of large language models (LLMs), this paper proposes a lightweight fine-tuning framework that requires no explicit chain-of-thought supervision. Methodologically, it introduces (1) a contrastive reasoning feedback mechanism that leverages positive and negative sample pairs to guide directional updates of reasoning paths in the latent space, and (2) a residual embedding fine-tuning strategy that achieves stable, low-overhead embedding optimization via gradient residual accumulation. Evaluated on five mathematical and logical reasoning benchmarks—including MathQA—the method yields significant performance gains: for instance, a 5% absolute accuracy improvement on MathQA. Crucially, it incurs no additional inference latency or increase in trainable parameters, demonstrating both effectiveness and practical deployability.

Technology Category

Application Category

📝 Abstract

Reasoning is a key component of language understanding in Large Language Models. While Chain-of-Thought prompting enhances performance via explicit intermediate steps, it suffers from sufficient token overhead and a fixed reasoning trajectory, preventing step-wise refinement. Recent advances in latent reasoning address these limitations by refining internal reasoning processes directly in the model's latent space, without producing explicit outputs. However, a key challenge remains: how to effectively update reasoning embeddings during post-training to guide the model toward more accurate solutions. To overcome this challenge, we propose a lightweight post-training framework that refines latent reasoning trajectories using two novel strategies: 1) Contrastive reasoning feedback, which compares reasoning embeddings against strong and weak baselines to infer effective update directions via embedding enhancement; 2) Residual embedding refinement, which stabilizes updates by progressively integrating current and historical gradients, enabling fast yet controlled convergence. Extensive experiments and case studies are conducted on five reasoning benchmarks to demonstrate the effectiveness of the proposed framework. Notably, a 5% accuracy gain on MathQA without additional training.

Problem

Research questions and friction points this paper is trying to address.

Improving latent reasoning in LLMs without explicit outputs

Updating reasoning embeddings effectively post-training

Enhancing accuracy via contrastive feedback and residual refinement

Innovation

Methods, ideas, or system contributions that make the work stand out.

Contrastive reasoning feedback for embedding enhancement

Residual embedding refinement for stable updates

Lightweight post-training framework for latent reasoning

🔎 Similar Papers

Do Large Language Models Latently Perform Multi-Hop Reasoning?