Top-b: Entropic Regulation of Relative Probability Bands in Autoregressive Language Processes

πŸ“… 2026-03-15
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Existing decoding strategies such as Top-k and Top-p rely on static truncation, which struggles to adapt to the dynamically varying information density in language generation, often failing to balance creativity and logical coherence. This work proposes Top-b decoding, the first approach to formalize the decoding process as an entropy-driven dynamic bandwidth modulation mechanism over a relative probability manifold. By coupling an adaptive bandwidth coefficient strictly with the model’s instantaneous Shannon entropy, Top-b dynamically adjusts the size of the candidate token set. Theoretical analysis demonstrates that Top-b minimizes tail distribution variance, thereby establishing an approximately self-regulating generative control system. Empirical results show that this method significantly reduces the variance between generation entropy and decoding behavior on benchmarks such as GPQA and GSM8K, while maintaining strong reasoning accuracy.

Technology Category

Application Category

πŸ“ Abstract
Probabilistic language generators are theoretically modeled as discrete stochastic processes, yet standard decoding strategies (Top-k, Top-p) impose static truncation rules that fail to accommodate the dynamic information density of natural language. This misalignment often forces a suboptimal trade-off: static bounds are either too restrictive for high-entropy creative generation or too permissive for low-entropy logical reasoning. In this work, we formalize the generation process as a trajectory through a relative probability manifold. We introduce Top-b (Adaptive Relative Band Sampling), a decoding strategy that regulates the candidate set via a dynamic bandwidth coefficient coupled strictly to the instantaneous Shannon entropy of the model's distribution. We provide a theoretical framework demonstrating that Top-b acts as a variance-minimizing operator on the tail distribution. Empirical validation on GPQA and GSM8K benchmarks indicates that Top-b significantly reduces generation entropy and inter-decoding variance while maintaining competitive reasoning accuracy, effectively approximating a self-regulating control system for autoregressive generation.
Problem

Research questions and friction points this paper is trying to address.

autoregressive language models
decoding strategies
information entropy
probability truncation
dynamic adaptation
Innovation

Methods, ideas, or system contributions that make the work stand out.

adaptive decoding
Shannon entropy
relative probability manifold
autoregressive generation
variance minimization
πŸ”Ž Similar Papers
No similar papers found.