Nearly Minimax Discrete Distribution Estimation in Kullback-Leibler Divergence with High Probability

📅 2025-07-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper studies high-probability estimation of a discrete distribution $p$ with support size $K$ under KL divergence. We propose a novel estimator based on online-to-batch conversion and suffix averaging. It is the first to simultaneously establish tight high-probability upper and lower bounds on the KL estimation error. Theoretically, the estimator achieves, with probability at least $1-delta$, a convergence rate of $Oig((K log log K + log(1/delta) log K)/nig)$, matching the minimax lower bound up to logarithmic factors. Moreover, we provide the first high-probability characterization of the maximum likelihood estimator’s performance under both $chi^2$ and KL divergences. The key innovation lies in constructing a computationally efficient and statistically optimal estimation framework, thereby closing the long-standing gap in matching high-probability upper and lower bounds for KL divergence estimation.

Technology Category

Application Category

📝 Abstract
We consider the problem of estimating a discrete distribution $p$ with support of size $K$ and provide both upper and lower bounds with high probability in KL divergence. We prove that in the worst case, for any estimator $widehat{p}$, with probability at least $δ$, $ ext{KL}(p | widehat{p}) geq Cmax{K,ln(K)ln(1/δ) }/n $, where $n$ is the sample size and $C > 0$ is a constant. We introduce a computationally efficient estimator $p^{ ext{OTB}}$, based on Online to Batch conversion and suffix averaging, and show that with probability at least $1 - δ$ $ ext{KL}(p | widehat{p}) leq C(Klog(log(K)) + ln(K)ln(1/δ)) /n$. Furthermore, we also show that with sufficiently many observations relative to $log(1/δ)$, the maximum likelihood estimator $ar{p}$ guarantees that with probability at least $1-δ$ $$ 1/6 χ^2(ar{p}|p) leq 1/4 χ^2(p|ar{p}) leq ext{KL}(p|ar{p}) leq C(K + log(1/δ))/n,, $$ where $χ^2$ denotes the $χ^2$-divergence.
Problem

Research questions and friction points this paper is trying to address.

Estimating discrete distributions with KL divergence bounds
Developing efficient estimators for high-probability accuracy
Comparing MLE performance with sample size conditions
Innovation

Methods, ideas, or system contributions that make the work stand out.

Online to Batch conversion estimator
Suffix averaging technique
Maximum likelihood estimator analysis
🔎 Similar Papers
No similar papers found.