Decoding in Geometry: Alleviating Embedding-Space Crowding for Complex Reasoning

📅 2026-01-30
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses a critical limitation in existing sampling-based decoding methods, which rely solely on token probabilities while neglecting the geometric relationships among tokens in the embedding space. This oversight leads to an “embedding space crowding” phenomenon that degrades performance on complex reasoning tasks. The study is the first to formally identify, quantify, and correlate this crowding effect with reduced success rates in mathematical reasoning. To mitigate this issue, the authors propose CraEG, a plug-and-play, geometry-guided sampling method that reweights sampling probabilities based on the intrinsic geometric structure of the embedding space. CraEG requires no additional training and operates with only a single forward pass, yet consistently improves both generation quality and diversity. Extensive experiments across multiple models and benchmarks demonstrate significant and robust gains in performance, particularly in reasoning accuracy, robustness, and output diversity.

Technology Category

Application Category

📝 Abstract
Sampling-based decoding underlies complex reasoning in large language models (LLMs), where decoding strategies critically shape model behavior. Temperature- and truncation-based methods reshape the next-token distribution through global probability reweighting or thresholding to balance the quality-diversity tradeoff. However, they operate solely on token probabilities, ignoring fine-grained relationships among tokens in the embedding space. We uncover a novel phenomenon, embedding-space crowding, where the next-token distribution concentrates its probability mass on geometrically close tokens in the embedding space. We quantify crowding at multiple granularities and find a statistical association with reasoning success in mathematical problem solving. Motivated by this finding, we propose CraEG, a plug-and-play sampling method that mitigates crowding through geometry-guided reweighting. CraEG is training-free, single-pass, and compatible with standard sampling strategies. Experiments on multiple models and benchmarks demonstrate improved generation performance, with gains in robustness and diversity metrics.
Problem

Research questions and friction points this paper is trying to address.

embedding-space crowding
complex reasoning
sampling-based decoding
large language models
next-token distribution
Innovation

Methods, ideas, or system contributions that make the work stand out.

embedding-space crowding
geometry-guided reweighting
sampling-based decoding
CraEG
complex reasoning
🔎 Similar Papers
No similar papers found.
Y
Yixin Yang
State Key Laboratory of Multimedia Information Processing, School of Computer Science, Peking University
Qingxiu Dong
Qingxiu Dong
Peking University
Natural Language ProcessingMachine Learning
Z
Zhifang Sui
State Key Laboratory of Multimedia Information Processing, School of Computer Science, Peking University