🤖 AI Summary
Distribution verification of samplers over infinite domains (e.g., the natural numbers) has long been hindered by worst-case computational complexity, rendering existing methods impractical. This paper introduces the first instance-dependent, efficient testing framework, built upon an interval-conditioning mechanism that jointly estimates continuous distribution mass and the distance between unknown and known distributions—thereby decoupling test efficiency from worst-case guarantees. The method supports rigorous statistical verification of samplers over arbitrary countably infinite domains while ensuring computational tractability. Empirical evaluation demonstrates up to 1000× speedup over state-of-the-art approaches across diverse distribution families, significantly improving practicality, scalability, and deployability of sampler validation.
📝 Abstract
Sampling algorithms play a pivotal role in probabilistic AI. However, verifying if a sampler program indeed samples from the claimed distribution is a notoriously hard problem. Provably correct testers like Barbarik, Teq, Flash, CubeProbe for testing of different kinds of samplers were proposed only in the last few years. All these testers focus on the worst-case efficiency, and do not support verification of samplers over infinite domains, a case occurring frequently in Astronomy, Finance, Network Security, etc.
In this work, we design the first tester of samplers with instance-dependent efficiency, allowing us to test samplers over natural numbers. Our tests are developed via a novel distance estimation algorithm between an unknown and a known probability distribution using an interval conditioning framework. The core technical contribution is a new connection with probability mass estimation of a continuous distribution. The practical gains are also substantial: our experiments establish up to 1000x speedup over state-of-the-art testers.