$p$-less Sampling: A Robust Hyperparameter-Free Approach for LLM Decoding

📅 2025-09-27

📈 Citations: 0

✨ Influential: 0

career value

210K/year

🤖 AI Summary

Existing LLM decoding methods are highly sensitive to hyperparameters such as temperature, requiring task-specific tuning. This work proposes *p*-less sampling—a hyperparameter-free, adaptive truncation strategy that dynamically computes, at each decoding step, a threshold for the effective support set of the probability distribution based on information entropy, thereby enabling temperature-robust, high-quality generation. Its key innovation lies in being the first method to leverage *global* distributional characteristics—rather than local approximations like top-*k* or nucleus sampling—for real-time, parameter-free control of the sampling space. Experiments across mathematical reasoning, logical inference, and creative writing tasks demonstrate that *p*-less sampling significantly outperforms mainstream decoding strategies: it yields more concise outputs, achieves faster inference, and exhibits markedly reduced quality degradation under high-temperature settings.

Technology Category

Application Category

📝 Abstract

Obtaining high-quality outputs from Large Language Models (LLMs) often depends upon the choice of a sampling-based decoding strategy to probabilistically choose the next token at each generation step. While a variety of such sampling methods have been proposed, their performance can be sensitive to the selection of hyperparameters which may require different settings depending upon the generation task and temperature configuration. In this work, we introduce $p$-less sampling: an information-theoretic approach to sampling which dynamically sets a truncation threshold at each decoding step based on the entire token probability distribution. Unlike existing methods, $p$-less sampling has no hyperparameters and consistently produces high-quality outputs as temperature increases. We provide theoretical perspectives on $p$-less sampling to ground our proposed method and conduct experiments to empirically validate its effectiveness across a range of math, logical reasoning, and creative writing tasks. Our results demonstrate how $p$-less sampling consistently outperforms existing sampling approaches while exhibiting much less degradation in text quality at higher temperature values. We further show how $p$-less achieves greater inference-time efficiency than alternative methods through lower average token sampling times and shorter generation lengths, without sacrificing accuracy. Finally, we provide analyses to highlight the benefits of $p$-less through qualitative examples, case studies, and diversity assessments.

Problem

Research questions and friction points this paper is trying to address.

Eliminates hyperparameter sensitivity in LLM decoding

Dynamically sets truncation thresholds using probability distributions

Maintains output quality across varying temperature configurations

Innovation

Methods, ideas, or system contributions that make the work stand out.

Hyperparameter-free sampling method for LLM decoding

Dynamic truncation threshold based on token distribution

Consistent high-quality outputs at higher temperatures

🔎 Similar Papers

Balancing Diversity and Risk in LLM Sampling: How to Select Your Method and Parameter for Open-Ended Text Generation