Diversified Sampling Improves Scaling LLM inference

📅 2025-02-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Scaling computational resources during large language model (LLM) inference often yields diminishing returns due to insufficient output diversity, leading to inefficient sampling and suboptimal performance. Method: We propose DivSampling, a novel prompt perturbation framework that jointly leverages task-agnostic and task-specific strategies to enhance output diversity—marking the first diversity-driven optimization of inference-time sampling. We provide theoretical analysis showing that increased output diversity significantly reduces response error rates. Contribution/Results: Evaluated across diverse benchmarks—including mathematical reasoning, commonsense reasoning, and code generation—DivSampling consistently improves Pass@10 accuracy. It is task-agnostic, scalable, and computationally efficient, overcoming fundamental limitations of conventional top-k and temperature-based sampling methods without requiring architectural modifications or additional training.

Technology Category

Application Category

📝 Abstract
While increasing training compute has significantly improved the performance of large language models (LLMs), similar gains have not been observed when scaling inference compute. We hypothesize that the primary issue lies in the uniformity of LLM outputs, which leads to inefficient sampling as models repeatedly generate similar but inaccurate responses. Motivated by an intriguing relationship between solution accuracy (Pass@10) and response diversity, we propose DivSampling-a novel and versatile sampling technique designed to enhance the diversity of candidate solutions by introducing prompt perturbations.DivSampling incorporates two categories of perturbations: task-agnostic approaches, which are general and not tailored to any specific task, and task-specific approaches, which are customized based on task content. Our theoretical analysis demonstrates that, under mild assumptions, the error rates of responses generated from diverse prompts are significantly lower compared to those produced by stationary prompts. Comprehensive evaluations across various tasks -including reasoning, mathematics, and code generation - highlight the effectiveness of DivSampling in improving solution accuracy. This scalable and efficient approach offers a new perspective on optimizing test-time inference, addressing limitations in current sampling strategies.
Problem

Research questions and friction points this paper is trying to address.

Enhances diversity in LLM outputs
Reduces error rates with diverse prompts
Improves accuracy in reasoning and code generation
Innovation

Methods, ideas, or system contributions that make the work stand out.

DivSampling enhances solution diversity
Introduces task-agnostic and task-specific perturbations
Reduces error rates with diverse prompts
🔎 Similar Papers
No similar papers found.
Tianchun Wang
Tianchun Wang
The Pennsylvania State University
machine learning
Z
Zichuan Liu
Nanjing University
Y
Yuanzhou Chen
University of California, Los Angeles
H
Haifeng Chen
NEC Laboratories America
X
Xiang Zhang
The Pennsylvania State University
W
Wei Cheng
NEC Laboratories America