RandSet: Randomized Corpus Reduction for Fuzzing Seed Scheduling

📅 2026-02-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the inefficiency in fuzzing caused by corpus bloat by introducing, for the first time, a randomized algorithm for corpus minimization. Formulating the problem as an instance of set cover, the approach constructs a small yet representative seed subset that preserves full feature coverage. By doing so, it maintains diversity while substantially reducing runtime overhead, thereby overcoming the traditional trade-off between performance and diversity. Integrated into AFL++, LibAFL, and Centipede, the method retains only 4.03%–5.99% of seeds on average across FuzzBench and standalone programs, achieving up to a 16.58% increase in coverage. On the Magma benchmark, it discovers seven additional real-world vulnerabilities with only a modest 1.17%–3.93% increase in runtime overhead.

Technology Category

Application Category

📝 Abstract
Seed explosion is a fundamental problem in fuzzing seed scheduling, where a fuzzer maintains a huge corpus and fails to choose promising seeds. Existing works focus on seed prioritization but still suffer from seed explosion since corpus size remains huge. We tackle this from a new perspective: corpus reduction, i.e., computing a seed corpus subset. However, corpus reduction could lead to poor seed diversity and large runtime overhead. Prior techniques like cull_queue, AFL-Cmin, and MinSet suffer from poor diversity or prohibitive overhead, making them unsuitable for high-frequency seed scheduling. We propose RandSet, a novel randomized corpus reduction technique that reduces corpus size and yields diverse seed selection simultaneously with minimal overhead. Our key insight is introducing randomness into corpus reduction to enjoy two benefits of a randomized algorithm: randomized output (diverse seed selection) and low runtime cost. Specifically, we formulate corpus reduction as a set cover problem and compute a randomized subset covering all features of the entire corpus. We then schedule seeds from this small, randomized subset rather than the entire corpus, effectively mitigating seed explosion. We implement RandSet on three popular fuzzers: AFL++, LibAFL, and Centipede, and evaluate it on standalone programs, FuzzBench, and Magma. Results show RandSet achieves significantly more diverse seed selection than other reduction techniques, with average subset ratios of 4.03% and 5.99% on standalone and FuzzBench programs. RandSet achieves a 16.58% coverage gain on standalone programs and up to 3.57% on FuzzBench in AFL++, triggers up to 7 more ground-truth bugs than the state-of-the-art on Magma, while introducing only 1.17%-3.93% overhead.
Problem

Research questions and friction points this paper is trying to address.

seed explosion
fuzzing
corpus reduction
seed scheduling
diversity
Innovation

Methods, ideas, or system contributions that make the work stand out.

corpus reduction
randomized algorithm
fuzzing seed scheduling
set cover
seed diversity
Yuchong Xie
Yuchong Xie
HKUST
Security
K
Kaikai Zhang
Hong Kong University of Science and Technology, China
Y
Yu Liu
Fudan University, China
R
Rundong Yang
Fudan University, China
P
Ping Chen
Fudan University, China
Shuai Wang
Shuai Wang
The Hong Kong University of Science and Technology
Computer SecuritySoftware Engineering
Dongdong She
Dongdong She
Hong Kong University of Science and Technology
SecurityMachine LearningProgram AnalysisFuzzing