SecureRAG-RTL: A Retrieval-Augmented, Multi-Agent, Zero-Shot LLM-Driven Framework for Hardware Vulnerability Detection

📅 2026-03-05

📈 Citations: 0

✨ Influential: 0

career value

209K/year

🤖 AI Summary

This work addresses the limited effectiveness of large language models (LLMs) in hardware security verification due to the scarcity of hardware description language (HDL) data, which hampers their ability to detect vulnerabilities. To overcome this challenge, the authors propose SecureRAG-RTL, a novel framework that introduces retrieval-augmented generation (RAG) into hardware security for the first time. By integrating multi-agent zero-shot reasoning with domain-specific knowledge retrieval, SecureRAG-RTL effectively compensates for LLMs’ deficiencies in HDL semantics and security rule comprehension. The study also constructs and publicly releases the first HDL benchmark dataset containing real-world vulnerabilities, annotated with 14 distinct flaw types. Experimental results demonstrate that the proposed approach improves vulnerability detection accuracy by approximately 30% across multiple LLMs.

Technology Category

Application Category

📝 Abstract

Large language models (LLMs) have shown remarkable capabilities in natural language processing tasks, yet their application in hardware security verification remains limited due to scarcity of publicly available hardware description language (HDL) datasets. This knowledge gap constrains LLM performance in detecting vulnerabilities within HDL designs. To address this challenge, we propose SecureRAG-RTL, a novel Retrieval-Augmented Generation (RAG)-based approach that significantly enhances LLM-based security verification of hardware designs. Our approach integrates domain-specific retrieval with generative reasoning, enabling models to overcome inherent limitations in hardware security expertise. We establish baseline vulnerability detection rates using prompt-only methods and then demonstrate that SecureRAG-RTL achieves substantial improvements across diverse LLM architectures, regardless of size. On average, our method increases detection accuracy by about 30%, highlighting its effectiveness in bridging domain knowledge gaps. For evaluation, we curated and annotated a benchmark dataset of 14 HDL designs containing real-world security vulnerabilities, which we will release publicly to support future research. These findings underscore the potential of RAG-driven augmentation to enable scalable, efficient, and accurate hardware security verification workflows.

Problem

Research questions and friction points this paper is trying to address.

hardware security

vulnerability detection

hardware description language

large language models

domain knowledge gap

Innovation

Methods, ideas, or system contributions that make the work stand out.

Retrieval-Augmented Generation

Hardware Security Verification

Large Language Models