🤖 AI Summary
This paper investigates whether fuzz harnesses in continuous fuzz testing suffer performance degradation—specifically reduced code coverage and diminished vulnerability detection capability—as software projects evolve.
Method: Leveraging 510 C/C++ projects from OSS-Fuzz, we combine static and dynamic analysis, SanitizerCoverage-guided instrumentation, historical build log mining, and manual root-cause categorization to systematically study harness evolution.
Contribution/Results: We present the first systematic evidence that fuzz harnesses exhibit unexpected robustness without active maintenance. We propose and empirically validate the first taxonomy of harness degradation root causes. Furthermore, we design and implement an指标-driven monitoring framework, integrated into both OSS-Fuzz and Fuzz Introspector, enabling automated degradation detection. Our findings show that most harnesses remain stable over extended periods; we identify recurrent degradation patterns (e.g., API mismatches, dead-code elimination); and our methodology has been deployed as production-grade functionality in open-source fuzzing infrastructure.
📝 Abstract
The purpose of continuous fuzzing platforms is to enable fuzzing for software projects via emph{fuzz harnesses} -- but as the projects continue to evolve, are these harnesses updated in lockstep, or do they run out of date? If these harnesses remain unmaintained, will they emph{degrade} over time in terms of coverage achieved or number of bugs found? This is the subject of our study. We study Google's OSS-Fuzz continuous fuzzing platform containing harnesses for 510 open-source C/C++ projects, many of which are security-critical. A harness is the glue code between the fuzzer and the project, so it needs to adapt to changes in the project. It is often added by a project maintainer or as part of a, sometimes short-lived, testing effort. Our analysis shows a consistent overall fuzzer coverage percentage for projects in OSS-Fuzz and a surprising longevity of the bug-finding capability of harnesses even without explicit updates, as long as they still build. However, we also identify and manually examine individual cases of harness coverage degradation and categorize their root causes. Furthermore, we contribute to OSS-Fuzz and Fuzz Introspector to support metrics to detect harness degradation in OSS-Fuzz projects guided by this research.