RAVEN: Retrieval-Augmented Vulnerability Exploration Network for Memory Corruption Analysis in User Code and Binary Programs

📅 2026-04-20
📈 Citations: 0
Influential: 0
📄 PDF

career value

209K/year
🤖 AI Summary
This work proposes an automated framework for generating high-quality memory corruption vulnerability analysis reports by integrating a multi-agent large language model (LLM) architecture with retrieval-augmented generation (RAG). The framework comprises four collaborative modules—Explorer, RAG Engine, Analyst, and Reporter—and represents the first application of a multi-agent LLM combined with RAG specifically for vulnerability documentation. It further introduces a task-specific LLM-based Judge to enable multidimensional automatic evaluation of the generated reports. Experimental results on 105 samples from the NIST-SARD dataset demonstrate an average report quality score of 54.21%, validating the effectiveness of the approach and significantly advancing the state of the art in automated vulnerability analysis and structured reporting.

Technology Category

Application Category

📝 Abstract
Large Language Models (LLMs) have demonstrated remarkable capabilities across various cybersecurity tasks, including vulnerability classification, detection, and patching. However, their potential in automated vulnerability report documentation and analysis remains underexplored. We present RAVEN (Retrieval Augmented Vulnerability Exploration Network), a framework leveraging LLM agents and Retrieval Augmented Generation (RAG) to synthesize comprehensive vulnerability analysis reports. Given vulnerable source code, RAVEN generates reports following the Google Project Zero Root Cause Analysis template. The framework uses four modules: an Explorer agent for vulnerability identification, a RAG engine retrieving relevant knowledge from curated databases including Google Project Zero reports and CWE entries, an Analyst agent for impact and exploitation assessment, and a Reporter agent for structured report generation. To ensure quality, RAVEN includes a task specific LLM Judge evaluating reports across structural integrity, ground truth alignment, code reasoning quality, and remediation quality. We evaluate RAVEN on 105 vulnerable code samples covering 15 CWE types from the NIST-SARD dataset. Results show an average quality score of 54.21%, supporting the effectiveness of our approach for automated vulnerability documentation.
Problem

Research questions and friction points this paper is trying to address.

vulnerability analysis
memory corruption
automated reporting
LLM agents
cybersecurity
Innovation

Methods, ideas, or system contributions that make the work stand out.

Retrieval-Augmented Generation
Large Language Model Agents
Automated Vulnerability Analysis
Memory Corruption
Structured Reporting
🔎 Similar Papers
No similar papers found.
P
Parteek Jamwal
New York University Abu Dhabi, UAE
M
Minghao Shao
New York University, USA
B
Boyuan Chen
New York University, USA
A
Achyuta Muthuvelan
New York University Abu Dhabi, UAE
A
Asini Subanya
New York University Abu Dhabi, UAE
B
Boubacar Ballo
New York University Abu Dhabi, UAE
K
Kashish Satija
New York University Abu Dhabi, UAE
M
Mariam Shafey
New York University Abu Dhabi, UAE
M
Mohamed Mahmoud
New York University Abu Dhabi, UAE
M
Moncif Dahaji Bouffi
New York University Abu Dhabi, UAE
P
Pasindu Wickramasinghe
New York University Abu Dhabi, UAE
S
Siyona Goel
New York University Abu Dhabi, UAE
Y
Yaakulya Sabbani
New York University Abu Dhabi, UAE
Hakim Hacid
Hakim Hacid
Technology Innovation Institute (TII), UAE
Machine LearningLLMDatabasesInformation RetrievalEdge ML
M
Mthandazo Ndhlovu
Technology Innovation Institute, UAE
E
Eleanna Kafeza
Technology Innovation Institute, UAE
Sanjay Rawat
Sanjay Rawat
Principal Security Researcher, Technology Innovation Institute (TII)
Static and Dynamic security program analysisvulnerability analysisintrusion detection system
Muhammad Shafique
Muhammad Shafique
Professor, ECE, New York University (AD-UAE, Tandon-USA), Director eBRAIN Lab
Embedded Machine LearningBrain-Inspired ComputingRobust & Energy-Efficient System DesignSmart