Quality-Assured Fuzz Harness Generation via the Four Principles Framework

📅 2026-05-20

📈 Citations: 0

✨ Influential: 0

career value

158K/year

🤖 AI Summary

Existing automatically generated fuzzing drivers often suffer from logical errors, API misuse, and lifecycle violations, undermining their effectiveness due to a lack of systematic quality assurance. This work proposes QuartetFuzz, which introduces the first formal definition of source-level correctness for fuzzing drivers and establishes a generate-check-repair loop grounded in a “Quartet” framework encompassing logical correctness, API protocol compliance, respect for safety boundaries, and entry-point adequacy. Integrating LLM agents, mathematical specification verification, and hybrid static-dynamic analysis, QuartetFuzz embeds quality assurance directly into driver generation. Evaluated on 23 open-source projects, it uncovered 42 vulnerabilities (29 confirmed and patched, including three CVEs) with a low false-positive rate of 4.8%. Additionally, auditing 586 existing drivers revealed 53 violations, 35 of which have since been fixed.

📝 Abstract

Fuzz testing is the dominant technique for finding memory-safety vulnerabilities in C/C++ software, yet its effectiveness hinges on the quality of fuzz harnesses -- the programs that bridge fuzzers and library APIs. A growing body of tools now automate harness generation, but none systematically ensures the correctness of produced harnesses: logic errors, API misuse, and lifecycle violations go undetected at the source level. As LLM-driven generation scales harness creation, uncontrolled quality turns scale into a liability. We present QuartetFuzz, an autonomous harness-generation system that systematically improves correctness throughout the generation process. At its core is the Four Principles framework -- Logic Correctness (P1), API Protocol Compliance (P2), Security Boundary Respect (P3), and Entry Point Adequacy (P4) -- the first source-level definition of harness correctness with mathematical specifications and implementable checks. We operationalize these principles in an autonomous LLM agent that produces harnesses satisfying P1-P4 through a generate-check-fix loop before any fuzzing begins. Deployed on 23 open-source projects spanning C/C++, Java, and JavaScript, the system submits 42 bug reports, of which 29 are fixed or confirmed upstream (including 3 CVEs) and only 2 are rejected (4.8% FP rate). During generation, the built-in P1/P2 checks automatically intercepted 58 harness-induced crashes that would otherwise have been false positives. Applied as a quality auditor to 586 existing production harnesses across 70 projects, the system identifies 53 violations (45 confirmed, 35 fixed). We release a dataset of 100 labeled harnesses for reproducible evaluation. Code and dataset are available at https://github.com/OwenSanzas/QuartetFuzz

Problem

Research questions and friction points this paper is trying to address.

fuzz harness

correctness

API misuse

logic error

lifecycle violation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Four Principles Framework

fuzz harness generation

LLM-based verification