The Illusion of Superposition? A Principled Analysis of Latent Thinking in Language Models

📅 2026-04-07

📈 Citations: 0

✨ Influential: 0

career value

188K/year

🤖 AI Summary

This study investigates whether language models genuinely leverage quantum-inspired superposition states to maintain multiple candidate solutions during latent chain-of-thought (Latent CoT) reasoning. Through three training paradigms—training from scratch, fine-tuning, and using off-the-shelf pretrained models—and employing Logit Lens analysis alongside entity-level probing of internal representations, the work provides the first empirical evidence that only models trained from scratch exhibit authentic superposition behavior. In contrast, pretrained models tend to rely on linguistic shortcuts due to inherent language biases and representational capacity constraints, leading to premature collapse of superposition states. These findings elucidate the necessary conditions for sustaining superposition in reasoning processes and uncover the mechanisms underlying its failure, thereby clarifying the critical role of training methodology in shaping a model’s capacity for complex, multi-hypothesis reasoning.

Technology Category

Application Category

📝 Abstract

Latent reasoning via continuous chain-of-thoughts (Latent CoT) has emerged as a promising alternative to discrete CoT reasoning. Operating in continuous space increases expressivity and has been hypothesized to enable superposition: the ability to maintain multiple candidate solutions simultaneously within a single representation. Despite theoretical arguments, it remains unclear whether language models actually leverage superposition when reasoning using latent CoTs. We investigate this question across three regimes: a training-free regime that constructs latent thoughts as convex combinations of token embeddings, a fine-tuned regime where a base model is adapted to produce latent thoughts, and a from-scratch regime where a model is trained entirely with latent thoughts to solve a given task. Using Logit Lens and entity-level probing to analyze internal representations, we find that only models trained from scratch exhibit signs of using superposition. In the training-free and fine-tuned regimes, we find that the superposition either collapses or is not used at all, with models discovering shortcut solutions instead. We argue that this is due to two complementary phenomena: i) pretraining on natural language data biases models to commit to a token in the last layers ii) capacity has a huge effect on which solutions a model favors. Together, our results offer a unified explanation for when and why superposition arises in continuous chain-of-thought reasoning, and identify the conditions under which it collapses.

Problem

Research questions and friction points this paper is trying to address.

superposition

latent chain-of-thought

language models

reasoning

internal representations

Innovation

Methods, ideas, or system contributions that make the work stand out.

superposition

latent chain-of-thought

continuous reasoning