🤖 AI Summary
Existing LLM decoding methods are highly sensitive to hyperparameters such as temperature, requiring task-specific tuning. This work proposes *p*-less sampling—a hyperparameter-free, adaptive truncation strategy that dynamically computes, at each decoding step, a threshold for the effective support set of the probability distribution based on information entropy, thereby enabling temperature-robust, high-quality generation. Its key innovation lies in being the first method to leverage *global* distributional characteristics—rather than local approximations like top-*k* or nucleus sampling—for real-time, parameter-free control of the sampling space. Experiments across mathematical reasoning, logical inference, and creative writing tasks demonstrate that *p*-less sampling significantly outperforms mainstream decoding strategies: it yields more concise outputs, achieves faster inference, and exhibits markedly reduced quality degradation under high-temperature settings.
📝 Abstract
Obtaining high-quality outputs from Large Language Models (LLMs) often depends upon the choice of a sampling-based decoding strategy to probabilistically choose the next token at each generation step. While a variety of such sampling methods have been proposed, their performance can be sensitive to the selection of hyperparameters which may require different settings depending upon the generation task and temperature configuration. In this work, we introduce $p$-less sampling: an information-theoretic approach to sampling which dynamically sets a truncation threshold at each decoding step based on the entire token probability distribution. Unlike existing methods, $p$-less sampling has no hyperparameters and consistently produces high-quality outputs as temperature increases. We provide theoretical perspectives on $p$-less sampling to ground our proposed method and conduct experiments to empirically validate its effectiveness across a range of math, logical reasoning, and creative writing tasks. Our results demonstrate how $p$-less sampling consistently outperforms existing sampling approaches while exhibiting much less degradation in text quality at higher temperature values. We further show how $p$-less achieves greater inference-time efficiency than alternative methods through lower average token sampling times and shorter generation lengths, without sacrificing accuracy. Finally, we provide analyses to highlight the benefits of $p$-less through qualitative examples, case studies, and diversity assessments.