Assessing Reliability of Statistical Maximum Coverage Estimators in Fuzzing

📅 2025-07-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing fuzz testing research lacks reliable ground-truth benchmarks for evaluating statistical maximum coverage estimators, as precise reachability labels are infeasible to obtain for real-world programs, hindering rigorous accuracy validation. To address this, we propose: (1) the first large-scale synthetic benchmark generation framework supporting complex control-flow structures and equipped with exact reachability labels; (2) a novel label-free reliability assessment protocol leveraging dynamic sampling-unit transformations and species richness modeling for statistical inference; and (3) a verifiable upper-bound estimation method integrating control-flow graph synthesis with dynamic sampling analysis. Experiments demonstrate consistent performance evaluation of existing estimators on both synthetic and real programs. Our work establishes the first reproducible, empirically verifiable statistical coverage evaluation benchmark—providing a rigorous foundation for future research in coverage estimation and fuzz testing evaluation.

Technology Category

Application Category

📝 Abstract
Background: Fuzzers are often guided by coverage, making the estimation of maximum achievable coverage a key concern in fuzzing. However, achieving 100% coverage is infeasible for most real-world software systems, regardless of effort. While static reachability analysis can provide an upper bound, it is often highly inaccurate. Recently, statistical estimation methods based on species richness estimators from biostatistics have been proposed as a potential solution. Yet, the lack of reliable benchmarks with labeled ground truth has limited rigorous evaluation of their accuracy. Objective: This work examines the reliability of reachability estimators from two axes: addressing the lack of labeled ground truth and evaluating their reliability on real-world programs. Methods: (1) To address the challenge of labeled ground truth, we propose an evaluation framework that synthetically generates large programs with complex control flows, ensuring well-defined reachability and providing ground truth for evaluation. (2) To address the criticism from use of synthetic benchmarks, we adapt a reliability check for reachability estimators on real-world benchmarks without labeled ground truth -- by varying the size of sampling units, which, in theory, should not affect the estimate. Results: These two studies together will help answer the question of whether current reachability estimators are reliable, and defines a protocol to evaluate future improvements in reachability estimation.
Problem

Research questions and friction points this paper is trying to address.

Evaluates reliability of statistical maximum coverage estimators in fuzzing
Addresses lack of labeled ground truth for estimator validation
Assesses estimator accuracy on real-world and synthetic programs
Innovation

Methods, ideas, or system contributions that make the work stand out.

Synthetic programs with ground truth generation
Reliability check via sampling unit variation
Evaluation framework for reachability estimators
🔎 Similar Papers
No similar papers found.
Danushka Liyanage
Danushka Liyanage
Postdoctoral Research Fellow, University of Sydney
Database TechnologySoftware SecurityStatistics and Probability TheoryMachine Learning
N
Nelum Attanayake
School of Computer Science, University of Sydney, Australia
Z
Zijian Luo
School of Computer Science, University of Sydney, Australia
Rahul Gopinath
Rahul Gopinath
The University of Sydney
CybersecuritySoftware Engineering