Scheduzz: Constraint-based Fuzz Driver Generation with Dual Scheduling

📅 2025-07-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Fuzz driver generation for library functions often produces non-compliant drivers—e.g., failing to properly release resources—leading to excessive false positives and severe computational waste. To address this, we propose an automated fuzz driver generation method leveraging large language models (LLMs) and a dual-scheduling mechanism. First, an LLM automatically extracts API usage constraints from documentation and source code. Second, we design a “generation–execution” dual-scheduling framework that models driver synthesis and execution as an online optimization problem, enabling constraint-guided rational driver selection and dynamic resource allocation. Evaluated on 33 real-world libraries, our approach achieves up to 1.89× higher code coverage than state-of-the-art methods, discovers 33 previously unknown vulnerabilities—including three assigned CVEs—and significantly reduces ineffective execution overhead.

Technology Category

Application Category

📝 Abstract
Fuzzing a library requires experts to understand the library usage well and craft high-quality fuzz drivers, which is tricky and tedious. Therefore, many techniques have been proposed to automatically generate fuzz drivers. However, they fail to generate rational fuzz drivers due to the lack of adherence to proper library usage conventions, such as ensuring a resource is closed after being opened. To make things worse, existing library fuzzing techniques unconditionally execute each driver, resulting in numerous irrational drivers that waste computational resources while contributing little coverage and generating false positive bug reports. To tackle these challenges, we propose a novel automatic library fuzzing technique, Scheduzz, an LLM-based library fuzzing technique. It leverages LLMs to understand rational usage of libraries and extract API combination constraints. To optimize computational resource utilization, a dual scheduling framework is implemented to efficiently manage API combinations and fuzz drivers. The framework models driver generation and the corresponding fuzzing campaign as an online optimization problem. Within the scheduling loop, multiple API combinations are selected to generate fuzz drivers, while simultaneously, various optimized fuzz drivers are scheduled for execution or suspension. We implemented Scheduzz and evaluated it in 33 real-world libraries. Compared to baseline approaches, Scheduzz significantly reduces computational overhead and outperforms UTopia on 16 out of 21 libraries. It achieves 1.62x, 1.50x, and 1.89x higher overall coverage than the state-of-the-art techniques CKGFuzzer, Promptfuzz, and the handcrafted project OSS-Fuzz, respectively. In addition, Scheduzz discovered 33 previously unknown bugs in these well-tested libraries, 3 of which have been assigned CVEs.
Problem

Research questions and friction points this paper is trying to address.

Automates fuzz driver generation for libraries
Ensures adherence to library usage constraints
Optimizes computational resource utilization via scheduling
Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM-based library fuzzing technique
Dual scheduling framework for efficiency
API combination constraints extraction
🔎 Similar Papers
No similar papers found.
Y
Yan Li
University of Science and Technology of China
W
Wenzhang Yang
Institute of AI for Industries
Y
Yuekun Wang
Singapore Management University
J
Jian Gao
Central University of Finance and Economics
S
Shaohua Wang
Central University of Finance and Economics
Yinxing Xue
Yinxing Xue
Research Professor, Chinese Academy of Sciences
Software EngineeringSoftware SecurityProgram AnalysisSearch Based Software Engineering
L
Lijun Zhang
Institute of Software, Chinese Academy of Sciences