HDLxGraph: Bridging Large Language Models and HDL Repositories via HDL Graph Databases

📅 2025-05-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Large language models (LLMs) exhibit limited comprehension and generation capabilities for warehouse-scale hardware description language (HDL) projects—typically comprising thousands to tens of thousands of lines of code. Method: This paper introduces a dual-view HDL graph database integrating abstract syntax trees (ASTs) and data flow graphs (DFGs), coupled with a task-adaptive Graph Retrieval-Augmented Generation (Graph RAG) framework. It proposes the first multi-granularity, structured semantic joint retrieval mechanism tailored for HDL. Contribution/Results: We construct HDLSearch—the first real-world, warehouse-scale HDL search benchmark. Experimental results show that our approach improves search accuracy, debugging efficiency, and code completion quality by 12.04%, 12.22%, and 5.04%, respectively, over conventional semantic RAG baselines. All artifacts—including source code, graph database construction tools, and the HDLSearch benchmark—are publicly released.

Technology Category

Application Category

📝 Abstract
Large Language Models (LLMs) have demonstrated their potential in hardware design tasks, such as Hardware Description Language (HDL) generation and debugging. Yet, their performance in real-world, repository-level HDL projects with thousands or even tens of thousands of code lines is hindered. To this end, we propose HDLxGraph, a novel framework that integrates Graph Retrieval Augmented Generation (Graph RAG) with LLMs, introducing HDL-specific graph representations by incorporating Abstract Syntax Trees (ASTs) and Data Flow Graphs (DFGs) to capture both code graph view and hardware graph view. HDLxGraph utilizes a dual-retrieval mechanism that not only mitigates the limited recall issues inherent in similarity-based semantic retrieval by incorporating structural information, but also enhances its extensibility to various real-world tasks by a task-specific retrieval finetuning. Additionally, to address the lack of comprehensive HDL search benchmarks, we introduce HDLSearch, a multi-granularity evaluation dataset derived from real-world repository-level projects. Experimental results demonstrate that HDLxGraph significantly improves average search accuracy, debugging efficiency and completion quality by 12.04%, 12.22% and 5.04% compared to similarity-based RAG, respectively. The code of HDLxGraph and collected HDLSearch benchmark are available at https://github.com/Nick-Zheng-Q/HDLxGraph.
Problem

Research questions and friction points this paper is trying to address.

Enhancing LLM performance in large HDL projects
Integrating graph RAG with ASTs and DFGs
Addressing lack of HDL search benchmarks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates Graph RAG with LLMs for HDL
Uses ASTs and DFGs for code representation
Introduces HDLSearch benchmark for evaluation
🔎 Similar Papers
No similar papers found.
P
Pingqing Zheng
University of Minnesota, Twin Cities, Minneapolis, MN, USA
J
Jiayin Qin
University of Minnesota, Twin Cities, Minneapolis, MN, USA
F
Fuqi Zhang
University of Minnesota, Twin Cities, Minneapolis, MN, USA
Shang Wu
Shang Wu
Unknown affiliation
Y
Yu Cao
University of Minnesota, Twin Cities, Minneapolis, MN, USA
Caiwen Ding
Caiwen Ding
Associate Professor, University of Minnesota - Twin Cities
Efficient Machine LearningML for EDAComputer Architecture
Y
Yang (Katie) Zhao
University of Minnesota, Twin Cities, Minneapolis, MN, USA