🤖 AI Summary
Manually crafting symbolic execution test harnesses requires substantial expert knowledge and does not scale well. This work proposes SAILOR, a novel approach that for the first time deeply integrates static analysis, large language models (LLMs), and symbolic execution to enable end-to-end automated vulnerability discovery. SAILOR first employs static analysis to identify candidate program locations and generate specifications; it then leverages an LLM to iteratively synthesize test harnesses and assertions; finally, it applies symbolic execution to detect vulnerabilities and validates them through concrete execution replay. Evaluated on ten open-source C/C++ projects totaling 6.8 million lines of code, SAILOR uncovered 379 previously unknown memory-safety vulnerabilities—corresponding to 421 confirmed crashes—dramatically outperforming the strongest baseline, which found only 12, thereby significantly advancing both scalability and precision in automated bug finding.
📝 Abstract
Symbolic execution detects vulnerabilities with precision, but applying it to large codebases requires harnesses that set up symbolic state, model dependencies, and specify assertions. Writing these harnesses has traditionally been a manual process requiring expert knowledge, which significantly limits the scalability of the technique. We present Static Analysis Informed and LLM-Orchestrated Symbolic Execution (SAILOR), which automates symbolic execution harness construction by combining static analysis with LLM-based synthesis. SAILOR operates in three phases: (1) static analysis identifies candidate vulnerable locations and generates vulnerability specifications; (2) an LLM uses vulnerability specifications and orchestrates harness synthesis by iteratively refining drivers, stubs, and assertions against compiler and symbolic execution feedback; symbolic execution then detects vulnerabilities using the generated harness, and (3) concrete replay validates the symbolic execution results against the unmodified project source. This design combines the scalability of static analysis, the code reasoning of LLMs, the path precision of symbolic execution, and the ground truth produced by concrete execution. We evaluate SAILOR on 10 open-source C/C++ projects totaling 6.8 M lines of code. SAILOR discovers 379 distinct, previously unknown memory-safety vulnerabilities (421 confirmed crashes). The strongest of five baselines we compare SAILOR to (agentic vulnerability detection using Claude Code with full codebase access and unlimited interaction), finds only 12 vulnerabilities. Each phase of SAILOR is critical: Without static analysis targeting confirmed vulnerabilities drop 12.2X; without iterative LLM synthesis zero vulnerabilities are confirmed; and without symbolic execution no approach can detect more than 12 vulnerabilities.