🤖 AI Summary
Existing latent chain-of-thought (Latent CoT) methods are often hindered by the complexity of multi-step reasoning structures, error propagation, and the substantial overhead of coordinating multiple models. This work proposes a single-model, single-stage latent reasoning compression framework that, for the first time, leverages rule-based priors to guide large language models in autonomously generating latent reasoning tokens within a unified training phase, thereby eliminating cascaded inference and dependence on multiple models. The approach achieves end-to-end joint optimization through KL divergence alignment between soft tokens and rule priors, cross-entropy constraints to ensure answer consistency, and a question-to-reasoning semantic alignment mechanism. Evaluated under extremely low token consumption, the method improves reasoning accuracy by 11.1% over current Latent CoT approaches while significantly reducing system complexity and enhancing scalability.
📝 Abstract
The Chain-of-Thought (CoT) paradigm, while enhancing the interpretability of Large Language Models (LLMs), is constrained by the inefficiencies and expressive limits of natural language. Latent Chain-of-Thought (latent CoT) reasoning, which operates in a continuous latent space, offers a promising alternative but faces challenges from structural complexities in existing multi-step or multi-model paradigms, such as error propagation and coordination overhead. In this paper, we introduce One-Model One-Step, a novel compression framework for Latent Reasoning with Rule-Based Priors(RuPLaR) to address this challenge. Our method trains an LLM to autonomously generate latent reasoning tokens in a single training stage, guided by rule-based prior probability distributions, thereby eliminating cascaded processes and inter-model dependencies. To ensure reasoning quality, we design a joint training objective that enforces answer consistency via cross-entropy, aligns soft tokens with rule-based priors via KL divergence (the Soft Thinking constraint), and adds a problem-thought semantic alignment constraint in the representation space. Extensive experiments show that our compression framework not only improves accuracy by 11.1% over existing latent CoT methods but also achieves this with minimal token usage, underscoring its effectiveness and extensibility. Code: https://github.com/xiaocen-luo/RuPLaR.