ProbTest: Unit Testing for Probabilistic Programs (Extended Version)

📅 2025-09-02

📈 Citations: 0

✨ Influential: 0

career value

172K/year

🤖 AI Summary

Testing probabilistic programs is challenging due to inherent randomness, making it difficult to determine the minimum number of executions required to reliably verify expected behaviors. This paper introduces ProbTest, a black-box unit testing method grounded in the classical coupon collector problem—the first application of this statistical model to probabilistic program testing. ProbTest automatically derives the minimal number of runs needed to achieve a user-specified statistical confidence level (e.g., 95% coverage), eliminating manual threshold tuning. It integrates statistical inference with combinatorial probability modeling and is implemented as a PyTest plugin, fully compatible with standard test-case authoring. Empirical evaluation on real-world benchmarks—including the Gymnasium reinforcement learning library and randomized data structures—demonstrates that ProbTest significantly improves test reliability and automation. Crucially, it guarantees theoretically sound correctness while maintaining practical engineering applicability.

Technology Category

Application Category

📝 Abstract

Testing probabilistic programs is non-trivial due to their stochastic nature. Given an input, the program may produce different outcomes depending on the underlying stochastic choices in the program. This means testing the expected outcomes of probabilistic programs requires repeated test executions unlike deterministic programs where a single execution may suffice for each test input. This raises the following question: how many times should we run a probabilistic program to effectively test it? This work proposes a novel black-box unit testing method, ProbTest, for testing the outcomes of probabilistic programs. Our method is founded on the theory surrounding a well-known combinatorial problem, the coupon collector's problem. Using this method, developers can write unit tests as usual without extra effort while the number of required test executions is determined automatically with statistical guarantees for the results. We implement ProbTest as a plug-in for PyTest, a well-known unit testing tool for python programs. Using this plug-in, developers can write unit tests similar to any other Python program and the necessary test executions are handled automatically. We evaluate the method on case studies from the Gymnasium reinforcement learning library and a randomized data structure.

Problem

Research questions and friction points this paper is trying to address.

Determining required test executions for probabilistic programs

Automating statistical guarantees in unit testing outcomes

Testing stochastic program outputs without developer effort

Innovation

Methods, ideas, or system contributions that make the work stand out.

Black-box unit testing for probabilistic programs

Leverages coupon collector's problem theory

Automatically determines required test executions

🔎 Similar Papers

TestGenEval: A Real World Unit Test Generation and Test Completion Benchmark