RTLCoder: Fully Open-Source and Efficient LLM-Assisted RTL Code Generation Technique

📅 2023-12-14
🏛️ IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
📈 Citations: 57
Influential: 12
📄 PDF
🤖 AI Summary
Existing RTL code generation methods heavily rely on commercial LLMs (e.g., GPT series), posing significant privacy risks, limited customization, and subpar performance from open-source models. This work addresses these limitations by proposing the first open-source RTL generation framework tailored for hardware design automation. Our approach comprises three key components: (1) constructing the first high-quality, open-source RTL code dataset; (2) developing a domain-specific LLM based on a lightweight 7B-parameter architecture, enhanced via RTL-oriented fine-tuning and 4-bit quantization—yielding a compact 4 GB model deployable locally on a single machine; and (3) achieving state-of-the-art accuracy on benchmarks including VerilogEval, outperforming both GPT-3.5 across all metrics and GPT-4 on this specific task. By jointly optimizing performance, privacy preservation, and practical deployability, our framework establishes a trustworthy, open foundation for AI-assisted hardware design.
📝 Abstract
The automatic generation of RTL code (e.g., Verilog) using natural language instructions and large language models (LLMs) has attracted significant research interest recently. However, most existing approaches heavily rely on commercial LLMs, such as ChatGPT, while open-source LLMs tailored for this specific design generation task exhibit notably inferior performance. The absence of high-quality open-source solutions restricts the flexibility and data privacy of this emerging technique. In this study, we present a new customized LLM solution with a modest parameter count of only 7B, achieving better performance than GPT-3.5 on all representative benchmarks for RTL code generation. Especially, it outperforms GPT-4 in VerilogEval Machine benchmark. This remarkable balance between accuracy and efficiency is made possible by leveraging our new RTL code dataset and a customized LLM algorithm, both of which have been made fully open-source. Furthermore, we have successfully quantized our LLM to 4-bit with a total size of 4 GB, enabling it to function on a single laptop with only slight performance degradation. This efficiency allows the RTL generator to serve as a local assistant for engineers, ensuring all design privacy concerns are addressed.
Problem

Research questions and friction points this paper is trying to address.

Lack of high-quality open-source solutions for RTL generation
Inferior performance of open-source LLMs in RTL design
Dependence on commercial LLMs restricts flexibility and privacy
Innovation

Methods, ideas, or system contributions that make the work stand out.

Customized 7B-parameter LLM for RTL generation
Open-source RTL dataset enhances performance
Lightweight solution outperforms GPT-3.5
🔎 Similar Papers
No similar papers found.
S
Shang Liu
Department of Electronic and Computer Engineering, Hong Kong University of Science and Technology (HKUST), Hong Kong SAR, China
Wenji Fang
Wenji Fang
Hong Kong University of Science and Technology
Electronic Design AutomationAI for EDAHardware Formal Verification
Y
Yao Lu
Department of Electronic and Computer Engineering, Hong Kong University of Science and Technology (HKUST), Hong Kong SAR, China
J
Jing Wang
Department of Electronic and Computer Engineering, Hong Kong University of Science and Technology (HKUST), Hong Kong SAR, China
Q
Qijun Zhang
Department of Electronic and Computer Engineering, Hong Kong University of Science and Technology (HKUST), Hong Kong SAR, China
Hongce Zhang
Hongce Zhang
Hong Kong University of Science and Technology (Guangzhou)
Logic Design & VerificationHardware Model Checking
Zhiyao Xie
Zhiyao Xie
Assistant Professor, Hong Kong University of Science and Technology
EDAMachine learningVLSI circuits and systems