DeepCircuitX: A Comprehensive Repository-Level Dataset for RTL Code Understanding, Generation, and PPA Analysis

📅 2025-02-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing RTL datasets are limited to file-level code or physical layouts, lacking multi-granularity benchmarks supporting RTL understanding, generation, and joint PPA (Power-Performance-Area) analysis. Method: DeepCircuitX introduces the first structured RTL dataset spanning four hierarchical levels—repository, file, module, and block—integrated with Chain-of-Thought functional annotations, Yosys-synthesized netlists, and OpenSTA-extracted PPA metrics. Contribution/Results: It enables unified repository-level RTL modeling and multi-level semantic annotation; supports end-to-end PPA prediction directly from RTL source code, bridging the abstraction gap between high-level description and physical implementation. Experiments demonstrate significant performance gains for mainstream open-source LLMs on RTL understanding, generation, and PPA prediction tasks; human evaluation confirms high annotation quality. The dataset is publicly released and has been widely adopted by the hardware design community.

Technology Category

Application Category

📝 Abstract
This paper introduces DeepCircuitX, a comprehensive repository-level dataset designed to advance RTL (Register Transfer Level) code understanding, generation, and power-performance-area (PPA) analysis. Unlike existing datasets that are limited to either file-level RTL code or physical layout data, DeepCircuitX provides a holistic, multilevel resource that spans repository, file, module, and block-level RTL code. This structure enables more nuanced training and evaluation of large language models (LLMs) for RTL-specific tasks. DeepCircuitX is enriched with Chain of Thought (CoT) annotations, offering detailed descriptions of functionality and structure at multiple levels. These annotations enhance its utility for a wide range of tasks, including RTL code understanding, generation, and completion. Additionally, the dataset includes synthesized netlists and PPA metrics, facilitating early-stage design exploration and enabling accurate PPA prediction directly from RTL code. We demonstrate the dataset's effectiveness on various LLMs finetuned with our dataset and confirm the quality with human evaluations. Our results highlight DeepCircuitX as a critical resource for advancing RTL-focused machine learning applications in hardware design automation.Our data is available at https://zeju.gitbook.io/lcm-team.
Problem

Research questions and friction points this paper is trying to address.

Enhance RTL code understanding and generation.
Enable accurate PPA analysis from RTL code.
Provide a comprehensive multilevel RTL dataset.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Comprehensive multi-level RTL dataset
Chain of Thought annotations included
Synthesized netlists with PPA metrics
🔎 Similar Papers
No similar papers found.
Z
Zeju Li
Department of Computer Science and Engineering, The Chinese University of Hong Kong, Sha Tin, Hong Kong S.A.R.; National Center of Technology Innovation for EDA, Nanjing, China
C
Changran Xu
Department of Computer Science and Engineering, The Chinese University of Hong Kong, Sha Tin, Hong Kong S.A.R.; National Center of Technology Innovation for EDA, Nanjing, China
Z
Zhengyuan Shi
Department of Computer Science and Engineering, The Chinese University of Hong Kong, Sha Tin, Hong Kong S.A.R.; National Center of Technology Innovation for EDA, Nanjing, China
Zedong Peng
Zedong Peng
MIT
Operations ResearchOptimizationMixed-Integer ProgrammingProcess System Engineering
Y
Yi Liu
Department of Computer Science and Engineering, The Chinese University of Hong Kong, Sha Tin, Hong Kong S.A.R.; National Center of Technology Innovation for EDA, Nanjing, China
Yunhao Zhou
Yunhao Zhou
Shanghai Jiao Tong University
EDAGNNLLM
Lingfeng Zhou
Lingfeng Zhou
Shanghai Jiao Tong University
C
Chengyu Ma
Faculty of Electrical Engineering and Computer Science, Ningbo University, Ningbo, China; National Center of Technology Innovation for EDA, Nanjing, China
Jianyuan Zhong
Jianyuan Zhong
The Chinese University of Hong Kong
Machine Learning
X
Xi Wang
School of Integrated Circuit, Southeast University, Nanjing, China; National Center of Technology Innovation for EDA, Nanjing, China
Jieru Zhao
Jieru Zhao
Associate Professor, Shanghai Jiao Tong University
Hardware-software co-designAI acceleration and systemCompilerFPGAHigh-level synthesis
Z
Zhufei Chu
Faculty of Electrical Engineering and Computer Science, Ningbo University, Ningbo, China; National Center of Technology Innovation for EDA, Nanjing, China
Xiaoyan Yang
Xiaoyan Yang
Advanced Digital Sciences Center
databasedeep learningtext mining
Q
Qiang Xu
Department of Computer Science and Engineering, The Chinese University of Hong Kong, Sha Tin, Hong Kong S.A.R.; National Center of Technology Innovation for EDA, Nanjing, China