RuPLaR : Efficient Latent Compression of LLM Reasoning Chains with Rule-Based Priors From Multi-Step to One-Step

📅 2026-05-10

📈 Citations: 0

✨ Influential: 0

career value

182K/year

🤖 AI Summary

Existing latent chain-of-thought (Latent CoT) methods are often hindered by the complexity of multi-step reasoning structures, error propagation, and the substantial overhead of coordinating multiple models. This work proposes a single-model, single-stage latent reasoning compression framework that, for the first time, leverages rule-based priors to guide large language models in autonomously generating latent reasoning tokens within a unified training phase, thereby eliminating cascaded inference and dependence on multiple models. The approach achieves end-to-end joint optimization through KL divergence alignment between soft tokens and rule priors, cross-entropy constraints to ensure answer consistency, and a question-to-reasoning semantic alignment mechanism. Evaluated under extremely low token consumption, the method improves reasoning accuracy by 11.1% over current Latent CoT approaches while significantly reducing system complexity and enhancing scalability.

📝 Abstract

The Chain-of-Thought (CoT) paradigm, while enhancing the interpretability of Large Language Models (LLMs), is constrained by the inefficiencies and expressive limits of natural language. Latent Chain-of-Thought (latent CoT) reasoning, which operates in a continuous latent space, offers a promising alternative but faces challenges from structural complexities in existing multi-step or multi-model paradigms, such as error propagation and coordination overhead. In this paper, we introduce One-Model One-Step, a novel compression framework for Latent Reasoning with Rule-Based Priors(RuPLaR) to address this challenge. Our method trains an LLM to autonomously generate latent reasoning tokens in a single training stage, guided by rule-based prior probability distributions, thereby eliminating cascaded processes and inter-model dependencies. To ensure reasoning quality, we design a joint training objective that enforces answer consistency via cross-entropy, aligns soft tokens with rule-based priors via KL divergence (the Soft Thinking constraint), and adds a problem-thought semantic alignment constraint in the representation space. Extensive experiments show that our compression framework not only improves accuracy by 11.1% over existing latent CoT methods but also achieves this with minimal token usage, underscoring its effectiveness and extensibility. Code: https://github.com/xiaocen-luo/RuPLaR.

Problem

Research questions and friction points this paper is trying to address.

Latent Chain-of-Thought

error propagation

coordination overhead

reasoning efficiency

multi-step reasoning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Latent Chain-of-Thought

Rule-Based Priors

One-Model One-Step