🤖 AI Summary
Testing probabilistic programs is challenging due to inherent randomness, making it difficult to determine the minimum number of executions required to reliably verify expected behaviors. This paper introduces ProbTest, a black-box unit testing method grounded in the classical coupon collector problem—the first application of this statistical model to probabilistic program testing. ProbTest automatically derives the minimal number of runs needed to achieve a user-specified statistical confidence level (e.g., 95% coverage), eliminating manual threshold tuning. It integrates statistical inference with combinatorial probability modeling and is implemented as a PyTest plugin, fully compatible with standard test-case authoring. Empirical evaluation on real-world benchmarks—including the Gymnasium reinforcement learning library and randomized data structures—demonstrates that ProbTest significantly improves test reliability and automation. Crucially, it guarantees theoretically sound correctness while maintaining practical engineering applicability.
📝 Abstract
Testing probabilistic programs is non-trivial due to their stochastic nature. Given an input, the program may produce different outcomes depending on the underlying stochastic choices in the program. This means testing the expected outcomes of probabilistic programs requires repeated test executions unlike deterministic programs where a single execution may suffice for each test input. This raises the following question: how many times should we run a probabilistic program to effectively test it? This work proposes a novel black-box unit testing method, ProbTest, for testing the outcomes of probabilistic programs. Our method is founded on the theory surrounding a well-known combinatorial problem, the coupon collector's problem. Using this method, developers can write unit tests as usual without extra effort while the number of required test executions is determined automatically with statistical guarantees for the results. We implement ProbTest as a plug-in for PyTest, a well-known unit testing tool for python programs. Using this plug-in, developers can write unit tests similar to any other Python program and the necessary test executions are handled automatically. We evaluate the method on case studies from the Gymnasium reinforcement learning library and a randomized data structure.