Robust Length Prediction: A Perspective from Heavy-Tailed Prompt-Conditioned Distributions

📅 2026-04-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the unreliability of output length prediction in existing large language model services, which typically assume a deterministic mapping from prompt to output length and overlook the inherent stochasticity and heavy-tailed nature of actual response lengths. We are the first to demonstrate that output lengths conditioned on a given prompt follow a heavy-tailed distribution. To tackle this, we introduce a novel paradigm combining robust point prediction based on the median with full distributional prediction that preserves uncertainty. Specifically, we propose two methods—ProD-M and ProD-D—that leverage model hidden states, multi-round sampling, median estimation, and explicit distribution modeling, accompanied by theoretical error analysis. Experiments across diverse scenarios show that our approach significantly improves prediction accuracy, thereby enhancing batching efficiency, memory reservation, and scheduling performance.
📝 Abstract
Output-length prediction is important for efficient LLM serving, as it directly affects batching, memory reservation, and scheduling. For prompt-only length prediction, most existing methods use a one-shot sampled length as the label, implicitly treating each prompt as if it had one true target length. We show that this is unreliable: even under a fixed model and decoding setup, the same prompt induces a \emph{prompt-conditioned output length distribution}, not a deterministic scalar, and this distribution is consistent with \emph{heavy-tailed} behavior. Motivated by this, we cast length prediction as robust estimation from heavy-tailed prompt-conditioned length distributions. We propose prompt-conditioned length distribution (ProD) methods, which construct training targets from multiple independent generations of the same prompt. Two variants are developed to reuse the served LLM's hidden states: \mbox{ProD-M}, which uses a median-based target for robust point prediction, and ProD-D, which uses a distributional target that preserves prompt-conditioned uncertainty. We provide theoretical justifications by analyzing the estimation error under a surrogate model. Experiments across diverse scenarios show consistent gains in prediction quality.
Problem

Research questions and friction points this paper is trying to address.

length prediction
heavy-tailed distribution
prompt-conditioned distribution
LLM serving
output length
Innovation

Methods, ideas, or system contributions that make the work stand out.

length prediction
heavy-tailed distribution
prompt-conditioned uncertainty
robust estimation
large language models
🔎 Similar Papers
No similar papers found.