Top-b: Entropic Regulation of Relative Probability Bands in Autoregressive Language Processes

📅 2026-03-15

📈 Citations: 0

✨ Influential: 0

career value

191K/year

🤖 AI Summary

Existing decoding strategies such as Top-k and Top-p rely on static truncation, which struggles to adapt to the dynamically varying information density in language generation, often failing to balance creativity and logical coherence. This work proposes Top-b decoding, the first approach to formalize the decoding process as an entropy-driven dynamic bandwidth modulation mechanism over a relative probability manifold. By coupling an adaptive bandwidth coefficient strictly with the model’s instantaneous Shannon entropy, Top-b dynamically adjusts the size of the candidate token set. Theoretical analysis demonstrates that Top-b minimizes tail distribution variance, thereby establishing an approximately self-regulating generative control system. Empirical results show that this method significantly reduces the variance between generation entropy and decoding behavior on benchmarks such as GPQA and GSM8K, while maintaining strong reasoning accuracy.

Technology Category

Application Category

📝 Abstract

Probabilistic language generators are theoretically modeled as discrete stochastic processes, yet standard decoding strategies (Top-k, Top-p) impose static truncation rules that fail to accommodate the dynamic information density of natural language. This misalignment often forces a suboptimal trade-off: static bounds are either too restrictive for high-entropy creative generation or too permissive for low-entropy logical reasoning. In this work, we formalize the generation process as a trajectory through a relative probability manifold. We introduce Top-b (Adaptive Relative Band Sampling), a decoding strategy that regulates the candidate set via a dynamic bandwidth coefficient coupled strictly to the instantaneous Shannon entropy of the model's distribution. We provide a theoretical framework demonstrating that Top-b acts as a variance-minimizing operator on the tail distribution. Empirical validation on GPQA and GSM8K benchmarks indicates that Top-b significantly reduces generation entropy and inter-decoding variance while maintaining competitive reasoning accuracy, effectively approximating a self-regulating control system for autoregressive generation.

Problem

Research questions and friction points this paper is trying to address.

autoregressive language models

decoding strategies

information entropy

probability truncation

dynamic adaptation

Innovation

Methods, ideas, or system contributions that make the work stand out.

adaptive decoding

Shannon entropy

relative probability manifold