Assessing Reliability of Statistical Maximum Coverage Estimators in Fuzzing

📅 2025-07-22

📈 Citations: 0

✨ Influential: 0

career value

207K/year

🤖 AI Summary

Existing fuzz testing research lacks reliable ground-truth benchmarks for evaluating statistical maximum coverage estimators, as precise reachability labels are infeasible to obtain for real-world programs, hindering rigorous accuracy validation. To address this, we propose: (1) the first large-scale synthetic benchmark generation framework supporting complex control-flow structures and equipped with exact reachability labels; (2) a novel label-free reliability assessment protocol leveraging dynamic sampling-unit transformations and species richness modeling for statistical inference; and (3) a verifiable upper-bound estimation method integrating control-flow graph synthesis with dynamic sampling analysis. Experiments demonstrate consistent performance evaluation of existing estimators on both synthetic and real programs. Our work establishes the first reproducible, empirically verifiable statistical coverage evaluation benchmark—providing a rigorous foundation for future research in coverage estimation and fuzz testing evaluation.

Technology Category

Application Category

📝 Abstract

Background: Fuzzers are often guided by coverage, making the estimation of maximum achievable coverage a key concern in fuzzing. However, achieving 100% coverage is infeasible for most real-world software systems, regardless of effort. While static reachability analysis can provide an upper bound, it is often highly inaccurate. Recently, statistical estimation methods based on species richness estimators from biostatistics have been proposed as a potential solution. Yet, the lack of reliable benchmarks with labeled ground truth has limited rigorous evaluation of their accuracy. Objective: This work examines the reliability of reachability estimators from two axes: addressing the lack of labeled ground truth and evaluating their reliability on real-world programs. Methods: (1) To address the challenge of labeled ground truth, we propose an evaluation framework that synthetically generates large programs with complex control flows, ensuring well-defined reachability and providing ground truth for evaluation. (2) To address the criticism from use of synthetic benchmarks, we adapt a reliability check for reachability estimators on real-world benchmarks without labeled ground truth -- by varying the size of sampling units, which, in theory, should not affect the estimate. Results: These two studies together will help answer the question of whether current reachability estimators are reliable, and defines a protocol to evaluate future improvements in reachability estimation.

Problem

Research questions and friction points this paper is trying to address.

Evaluates reliability of statistical maximum coverage estimators in fuzzing

Addresses lack of labeled ground truth for estimator validation

Assesses estimator accuracy on real-world and synthetic programs

Innovation

Methods, ideas, or system contributions that make the work stand out.

Synthetic programs with ground truth generation

Reliability check via sampling unit variation

Evaluation framework for reachability estimators

🔎 Similar Papers

On the Challenges of Fuzzing Techniques via Large Language Models