AEGIS: From Clues to Verdicts -- Graph-Guided Deep Vulnerability Reasoning via Dialectics and Meta-Auditing

📅 2026-03-21

📈 Citations: 0

✨ Influential: 0

career value

166K/year

🤖 AI Summary

This work addresses the susceptibility of large language models to hallucination and high false positive rates in vulnerability detection due to a lack of factual grounding. To this end, the authors propose AEGIS, a novel framework that introduces dialectical reasoning and meta-auditing into the domain for the first time. AEGIS constructs a closed evidence space from code property graphs and employs multi-agent collaboration to perform clue identification, dependency chain reconstruction, dialectical argument generation, and independent audit verification—shifting vulnerability detection from speculative inference toward an evidence-based paradigm. Evaluated on the PrimeVul dataset, AEGIS achieves 122 correct pairwise predictions (the first approach to surpass 100), reduces the false positive rate by up to 54.40%, and incurs an average inference cost of only $0.09 per sample.

Technology Category

Application Category

📝 Abstract

Large Language Models (LLMs) are increasingly adopted for vulnerability detection, yet their reasoning remains fundamentally unsound. We identify a root cause shared by both major mitigation paradigms (agent-based debate and retrieval augmentation): reasoning in an ungrounded deliberative space that lacks a bounded, hypothesis-specific evidence base. Without such grounding, agents fabricate cross-function dependencies, and retrieval heuristics supply generic knowledge decoupled from the repository's data-flow topology. Consequently, the resulting conclusions are driven by rhetorical persuasiveness rather than verifiable facts. To ground this deliberation, we present AEGIS, a novel multi-agent framework that shifts detection from ungrounded speculation to forensic verification over a closed factual substrate. Guided by a "From Clue to Verdict" philosophy, AEGIS first identifies suspicious code anomalies (clues), then dynamically reconstructs per-variable dependency chains for each clue via on-demand slicing over a repository-level Code Property Graph. Within this closed evidence boundary, a Verifier Agent constructs competing dialectical arguments for and against exploitability, while an independent Audit Agent scrutinizes every claim against the trace, exercising veto power to prevent hallucinated verdicts. Evaluation on the rigorous PrimeVul dataset demonstrates that AEGIS establishes a new state-of-the-art, achieving 122 Pair-wise Correct Predictions. To our knowledge, this is the first approach to surpass 100 on this benchmark. It reduces the false positive rate by up to 54.40% compared to leading baselines, at an average cost of $0.09 per sample without any task-specific training.

Problem

Research questions and friction points this paper is trying to address.

vulnerability detection

large language models

reasoning grounding

evidence base

code property graph

Innovation

Methods, ideas, or system contributions that make the work stand out.

Graph-Guided Reasoning

Code Property Graph

Multi-Agent Verification