Tolerant Testing of High-Dimensional Samplers with Subcube Conditioning

📅 2023-08-08

🏛️ arXiv.org

📈 Citations: 2

✨ Influential: 0

career value

189K/year

🤖 AI Summary

This paper studies tolerant closeness testing of two distributions $P$ and $Q$ over the high-dimensional Boolean hypercube ${0,1}^n$, under the subcube conditional sampling (SUBCOND) model: distinguishing whether $|P - Q|_1 leq varepsilon_1$ or $geq varepsilon_2$, for $0 leq varepsilon_1 < varepsilon_2$. Prior work only addressed the non-tolerant case ($varepsilon_1 = 0$) with query complexity $widetilde{O}(n^{5}/varepsilon_2^{5})$. We propose the first general tolerant SUBCOND testing framework applicable to arbitrary $varepsilon_1 geq 0$, designing the first adjustable-tolerance efficient algorithm. Our method achieves query complexity $widetilde{O}(n^{3}/(varepsilon_2 - varepsilon_1)^{5})$, breaking the exponential lower bounds inherent in standard sampling models. This result establishes a new paradigm for distribution verification of high-dimensional samplers—simultaneously achieving theoretical optimality and practical feasibility.

📝 Abstract

We study the tolerant testing problem for high-dimensional samplers. Given as input two samplers $mathcal{P}$ and $mathcal{Q}$ over the $n$-dimensional space ${0,1}^n$, and two parameters $varepsilon_2>varepsilon_1$, the goal of tolerant testing is to test whether the distributions generated by $mathcal{P}$ and $mathcal{Q}$ are $varepsilon_1$-close or $varepsilon_2$-far. Since exponential lower bounds (in $n$) are known for the problem in the standard sampling model, research has focused on models where one can draw extit{conditional} samples. Among these models, extit{subcube conditioning} ($mathsf{SUBCOND}$), which allows conditioning on arbitrary subcubes of the domain, holds the promise of widespread adoption in practice owing to its ability to capture the natural behavior of samplers in constrained domains. To translate the promise into practice, we need to overcome two crucial roadblocks for tests based on $mathsf{SUBCOND}$: the prohibitively large number of queries ($ ilde{mathcal{O}}(n^5/varepsilon_2^5)$) and limitation to non-tolerant testing (i.e., $varepsilon_1 = 0$). The primary contribution of this work is to overcome the above challenges: we design a new tolerant testing methodology (i.e., $varepsilon_1 geq 0$) that allows us to significantly improve the upper bound to $ ilde{mathcal{O}}(n^3/(varepsilon_2-varepsilon_1)^5)$.

Problem

Research questions and friction points this paper is trying to address.

Estimate statistical distance between high-dimensional discrete distributions

Overcome exponential lower bounds in standard sampling model

Develop polynomial query algorithm using subcube conditional sampling

Innovation

Methods, ideas, or system contributions that make the work stand out.

Subcube conditional sampling for distance estimation

Polynomial query algorithm in high dimensions

Statistical distance with additive tolerance approximation

🔎 Similar Papers

Diffusion-Based Failure Sampling for Cyber-Physical Systems