List-Level Distribution Coupling with Applications to Speculative Decoding and Lossy Compression

📅 2025-06-05

📈 Citations: 0

✨ Influential: 0

career value

215K/year

🤖 AI Summary

This work studies distributionally coupled list-level relaxation: given two distributions, candidates are sampled from one distribution, and the list is accepted if any candidate matches a single sample drawn from the other. To address this, we introduce generalized Gumbel-max sampling and the list-matching lemma, establishing the first theoretical framework for list-level coupling. Based on this framework, we design the first multi-draft speculative decoding algorithm with draft-invariance guarantees and propose a novel distributed lossy compression paradigm leveraging side information. We derive a tight lower bound on token-level acceptance probability. Experiments demonstrate that our method matches SpecTr and SpecInfer in language modeling performance, while significantly improving the rate–distortion trade-off on Gaussian sources and MNIST.

Technology Category

Application Category

📝 Abstract

We study a relaxation of the problem of coupling probability distributions -- a list of samples is generated from one distribution and an accept is declared if any one of these samples is identical to the sample generated from the other distribution. We propose a novel method for generating samples, which extends the Gumbel-max sampling suggested in Daliri et al. (arXiv:2408.07978) for coupling probability distributions. We also establish a corresponding lower bound on the acceptance probability, which we call the list matching lemma. We next discuss two applications of our setup. First, we develop a new mechanism for multi-draft speculative sampling that is simple to implement and achieves performance competitive with baselines such as SpecTr and SpecInfer across a range of language tasks. Our method also guarantees a certain degree of drafter invariance with respect to the output tokens which is not supported by existing schemes. We also provide a theoretical lower bound on the token level acceptance probability. As our second application, we consider distributed lossy compression with side information in a setting where a source sample is compressed and available to multiple decoders, each with independent side information. We propose a compression technique that is based on our generalization of Gumbel-max sampling and show that it provides significant gains in experiments involving synthetic Gaussian sources and the MNIST image dataset.

Problem

Research questions and friction points this paper is trying to address.

Extends Gumbel-max sampling for coupling probability distributions

Develops multi-draft speculative sampling for language tasks

Proposes distributed lossy compression with side information

Innovation

Methods, ideas, or system contributions that make the work stand out.

Extends Gumbel-max sampling for distribution coupling

Develops multi-draft speculative sampling mechanism

Proposes compression technique using generalized Gumbel-max sampling

🔎 Similar Papers

Block Verification Accelerates Speculative Decoding