š¤ AI Summary
Existing hybrid testing approaches that combine fuzzing, symbolic execution, and sampling often suffer from inefficiency due to aggressive pruning in symbolic execution and suboptimal sampling strategies, leading to missed vulnerabilities. This work proposes S²F, a novel hybrid testing framework that integrates the precision of traditional symbolic execution with the scalability of customized symbolic execution. S²F introduces a cooperative scheduling mechanism that refines branch-handling strategies to accurately guide intelligent sampling toward critical program paths. Experimental evaluation on 15 real-world programs demonstrates that S²F achieves an average edge coverage improvement of 6.14%, discovers 32.6% more crashes, and uncovers three previously unknown vulnerabilities.
š Abstract
Hybrid testing that integrates fuzzing, symbolic execution, and sampling has demonstrated superior testing efficiency compared to individual techniques. However, the state-of-the-art (SOTA) hybrid testing tools do not fully exploit the capabilities of symbolic execution and sampling in two key aspects. First, the SOTA hybrid testing tools employ tailored symbolic execution engines that tend to over-prune branches, leading to considerable time wasted waiting for seeds from the fuzzer and missing opportunities to discover crashes. Second, existing methods do not apply sampling to the appropriate branches and therefore cannot utilize the full capability of sampling. To address these two limitations, we propose a novel hybrid testing architecture that combines the precision of conventional symbolic execution with the scalability of tailored symbolic execution engines. Based on this architecture, we propose several principles for combining fuzzing, symbolic execution, and sampling. We implement our method in a hybrid testing tool S$^2$F. To evaluate its effectiveness, we conduct extensive experiments on 15 real-world programs. Experimental results demonstrate that S$^2$F outperforms the SOTA tool, achieving an average improvement of 6.14% in edge coverage and 32.6% in discovered crashes. Notably, our tool uncovers three previously unknown crashes in real-world programs.