π€ AI Summary
This work proposes and investigates the problem of entropy equivalence testing: given samples from two unknown distributions, determine whether their Shannon entropies are equal or differ by at least Ξ΅. The authors design a sample- and time-efficient algorithm that establishes the first optimal upper bound on sample complexity for this task. By integrating techniques from Shannon entropy estimation, hypothesis testing, and structural properties of Bayesian networks, the proposed method significantly outperforms conventional distribution closeness testing approaches. Furthermore, it yields the first non-trivial closeness testing algorithm for low-degree Bayesian networks, achieving substantial improvements in both sample and computational complexity.
π Abstract
We introduce the problem of \emph{entropy equivalence testing} for probability distributions, a relaxation of the well-studied closeness testing problem, where the distribution testing algorithm is now only required to distinguish, given samples from two unknown distributions $p,q$ and a parameter $\varepsilon \in(0,1/2]$, between $p=q$ and $|H(p)-H(q)| \geq \varepsilon$ (where $H$ denotes the Shannon entropy). We provide a time- and sample-efficient algorithm for this task, showing that the optimal sample complexity for this task can be significantly lower than that of closeness testing. As an application, we leverage this result to provide the first non-trivial testing algorithm for (standard) closeness of low-degree \emph{Bayesian networks}, which significantly improves on either the sample or time complexity of a baseline based on full learning.