Latent Reasoning in LLMs as a Vocabulary-Space Superposition

📅 2025-10-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Explicit chain-of-thought (CoT) reasoning in large language models (LLMs) incurs high computational overhead, while existing latent reasoning approaches suffer substantial performance degradation due to unstructured latent spaces. This paper proposes Latent-SFT: the first method to model implicit reasoning as superposition states within the vocabulary space, enabling the model to autonomously generate and collapse latent states within the column space—without explicit CoT sequences—via a two-stage training paradigm. Key innovations include a dedicated attention mask, a latent-state token encoder, vocabulary-space projection, and joint KL-divergence and cross-entropy optimization. We further introduce two quantitative metrics—effective compression ratio and global parallelism—to measure inference-path compression and multi-path fusion capability. On GSM8k, Latent-SFT matches explicit supervised fine-tuning (SFT) performance while reducing inference steps by 75%; it also significantly outperforms prior latent-state methods on Math500 and AIME24.

Technology Category

Application Category

📝 Abstract
Large language models (LLMs) demonstrate strong reasoning abilities with chain-of-thought prompting, but explicit reasoning introduces substantial computational overhead. Recent work on latent reasoning reduces this cost by reasoning in latent space without explicit supervision, but performance drops significantly. Our preliminary experiments suggest that this degradation stems from the unstructured latent space, which makes fitting latent tokens difficult. To address this, we restrict the latent space to the column space of the LLM vocabulary, treating latent reasoning as a superposition over vocabulary probabilities. Once latent reasoning concludes, it collapses into an eigenstate of explicit reasoning to yield the final answer. Based on this idea, we propose Latent-SFT, a two-stage learning framework. In the first stage, we design two specialized attention masks to guide the Latent Token Encoder in generating latent tokens, allowing the LLM to produce the correct answer conditioned on them. In the second stage, the Latent Token Encoder is discarded, and the LLM is directly trained to generate these latent tokens autonomously for latent reasoning, optimized with KL and CE losses. Latent-SFT sets a new state of the art on GSM8k, matching explicit SFT performance while cutting reasoning chains by up to 4 times and outperforming prior latent methods. On Math500 and AIME24, lexical probability-based latent reasoning also clearly surpasses hidden-state-based approaches. Our metrics of effective compression rate and effective global parallelism further show that latent reasoning is both the compression of a single path and the superposition of multiple paths.
Problem

Research questions and friction points this paper is trying to address.

Reducing computational overhead of explicit reasoning in LLMs
Addressing performance degradation in latent reasoning methods
Structuring latent space via vocabulary probability superposition
Innovation

Methods, ideas, or system contributions that make the work stand out.

Latent reasoning uses vocabulary-space superposition
Two-stage training with specialized attention masks
Autonomous latent token generation with KL and CE losses
🔎 Similar Papers
No similar papers found.
Jingcheng Deng
Jingcheng Deng
Institute of Computing Technology, Chinese Academy of Sciences
Retrieval-Augmented ModelLLM Multi-Agent
Liang Pang
Liang Pang
Associate Professor, Institute of Computing Technology, Chinese Academy of Sciences
Large Language ModelSemantic MatchingQuestion AnsweringText MatchingText Generation
Z
Zihao Wei
State Key Laboratory of AI Safety, Institute of Computing Technology, Chinese Academy of Sciences; University of Chinese Academy of Sciences
S
Shichen Xu
State Key Laboratory of AI Safety, Institute of Computing Technology, Chinese Academy of Sciences; University of Chinese Academy of Sciences
Zenghao Duan
Zenghao Duan
CAS Key Laboratory of AI Safety, Institute of Computing Technology, CAS
large language model
K
Kun Xu
State Key Laboratory of AI Safety, Institute of Computing Technology, Chinese Academy of Sciences; University of Chinese Academy of Sciences
Y
Yang Song
State Key Laboratory of AI Safety, Institute of Computing Technology, Chinese Academy of Sciences; University of Chinese Academy of Sciences
H
Huawei Shen
State Key Laboratory of AI Safety, Institute of Computing Technology, Chinese Academy of Sciences; University of Chinese Academy of Sciences
Xueqi Cheng
Xueqi Cheng
Ph.D. student, Florida State University
Data miningLLMGNNComputational social science