Veritas: A Semantically Grounded Agentic Framework for Memory Corruption Vulnerability Detection in Binaries

📅 2026-05-14

📈 Citations: 0

✨ Influential: 0

career value

146K/year

🤖 AI Summary

Detecting memory corruption vulnerabilities in stripped binary programs is hindered by semantic loss and path complexity. This work proposes the first semantics-driven multi-agent detection framework, systematically applying semantic grounding principles to binary vulnerability analysis. It recovers flow-sensitive object semantics via static slicing lifted from RetDec to LLVM IR, then leverages a dual-view large language model—operating on both decompiled C code and LLVM IR—to generate candidate vulnerabilities. These candidates are dynamically verified by a multi-agent validator integrating breakpoints and memory-check oracles. Evaluated on real-world vulnerability benchmarks, the approach achieves 90% recall, with manual inspection of 623 candidates revealing zero false positives, and successfully uncovers a previously unknown Apple vulnerability (assigned a CVE).

📝 Abstract

Detecting memory corruption vulnerabilities in stripped binaries requires recovering object semantics, interprocedural propagation, and feasible triggers from low-level, lossy representations. Recent LLM-based approaches improve code understanding, but reliable detection still requires grounding in memory-relevant semantics and runtime feasibility evidence. We present Veritas, a semantically grounded framework for binary memory corruption vulnerability detection. Veritas combines a static slicer over RetDec-lifted LLVM IR, a dual-view LLM detector that reasons step by step over grounded flows using decompiled C and selective LLVM IR, and a multi-agent validator that checks hypotheses against debugger-visible artifacts and runtime evidence. The slicer reconstructs value-flow relations from LLVM-IR facts, including def-use, calls, returns, globals, and pointer operations, and emits compact witness-backed flow objects. The detector uses these artifacts to reason about control flow, bounds, and object correspondence without rediscovering whole-binary propagation. The validator confirms or rejects candidates through guided debugging, breakpoint inspection, and memory-checking oracles. We implement Veritas as a modular pipeline and evaluate it on a curated benchmark of real-world binary vulnerability cases. Veritas achieves 90\% recall. For false-positive assessment, we exhaustively validate and manually verify 623 detector candidates and audit additional candidates from larger cases. The exhaustive subset produces no false positives, while the additional audit identifies two confirmed false positives. In a real-world application, Veritas discovered a previously unknown Apple vulnerability that was confirmed and assigned a CVE. These results support semantic grounding as an operational design principle for practical binary vulnerability detection.

Problem

Research questions and friction points this paper is trying to address.

memory corruption

binary analysis

vulnerability detection

semantic grounding

stripped binaries

Innovation

Methods, ideas, or system contributions that make the work stand out.

semantic grounding

binary vulnerability detection

LLM-based reasoning