Top-H Decoding: Adapting the Creativity and Coherence with Bounded Entropy in Text Generation

📅 2025-09-02

📈 Citations: 0

✨ Influential: 0

career value

179K/year

🤖 AI Summary

Large language models (LLMs) struggle to simultaneously ensure logical coherence and generative diversity in open-ended text generation. Existing truncation-based sampling methods (e.g., top-p, min-p) lack explicit modeling of model confidence, hindering dynamic trade-offs between these objectives. To address this, we propose top-H decoding: a novel sampling strategy grounded in bounded entropy constraints that maximizes probability mass while formally linking creativity (diversity) and coherence (low entropy). This yields an NP-hard entropy-constrained optimization problem, for which we design an efficient greedy approximation algorithm. Evaluated via LLM-as-judge metrics, top-H decoding outperforms min-p sampling by 25.63% on creative writing benchmarks while maintaining strong robustness across reasoning and instruction-following tasks—including GPQA, GSM8K, and MT-Bench—enabling high-quality generation even under elevated temperature settings.

Technology Category

Application Category

📝 Abstract

Large language models (LLMs), despite their impressive performance across a wide range of tasks, often struggle to balance two competing objectives in open-ended text generation: fostering diversity and creativity while preserving logical coherence. Existing truncated sampling techniques, including temperature scaling, top-$p$ (nucleus) sampling, and min-$p$ sampling, aim to manage this trade-off. However, they exhibit limitations, particularly in the effective incorporation of the confidence of the model into the corresponding sampling strategy. For example, min-$p$ sampling relies on a single top token as a heuristic for confidence, eventually underutilizing the information of the probability distribution. Toward effective incorporation of the confidence of the model, in this paper, we present **top-H** decoding. We first establish the theoretical foundation of the interplay between creativity and coherence in truncated sampling by formulating an **entropy-constrained minimum divergence** problem. We then prove this minimization problem to be equivalent to an **entropy-constrained mass maximization** (ECMM) problem, which is NP-hard. Finally, we present top-H decoding, a computationally efficient greedy algorithm to solve the ECMM problem. Extensive empirical evaluations demonstrate that top-H outperforms the state-of-the-art (SoTA) alternative of min-$p$ sampling by up to **25.63%** on creative writing benchmarks, while maintaining robustness on question-answering datasets such as GPQA, GSM8K, and MT-Bench. Additionally, an *LLM-as-judge* evaluation confirms that top-H indeed produces coherent outputs even at higher temperatures, where creativity is especially critical. In summary, top-H advances SoTA in open-ended text generation and can be *easily integrated* into creative writing applications. The code is available at https://github.com/ErfanBaghaei/Top-H-Decoding.

Problem

Research questions and friction points this paper is trying to address.

Balancing diversity and coherence in open-ended text generation

Incorporating model confidence into sampling strategies effectively

Solving entropy-constrained mass maximization for creative coherence

Innovation

Methods, ideas, or system contributions that make the work stand out.

Top-H decoding balances creativity and coherence

Solves entropy-constrained mass maximization problem

Greedy algorithm outperforms min-p sampling

🔎 Similar Papers

Turning Up the Heat: Min-p Sampling for Creative and Coherent LLM Outputs