RTLSeek: Boosting the LLM-Based RTL Generation with Multi-Stage Diversity-Oriented Reinforcement Learning

📅 2026-03-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limitations of current large language models (LLMs) in generating register-transfer level (RTL) code, which suffer from insufficient functional correctness and limited design diversity due to the scarcity of high-quality, verifiable training data. To overcome these challenges, the authors propose RTLSeek, a multi-stage, diversity-oriented reinforcement learning post-training framework that innovatively integrates expert-defined rules, closed-loop feedback from electronic design automation (EDA) tools, and a multi-objective reward scheduling mechanism. Through a three-phase training process, RTLSeek synergistically enhances both functional correctness and structural diversity of generated RTL code even under data-constrained conditions. Experimental results on the RTLLM benchmark demonstrate that RTLSeek significantly outperforms existing approaches, and ablation studies confirm that enhanced exploration of the design space effectively improves both the quality and diversity of synthesized RTL.
📝 Abstract
Register Transfer Level (RTL) design translates high-level specifications into hardware using HDLs such as Verilog. Although LLM-based RTL generation is promising, the scarcity of functionally verifiable high-quality data limits both accuracy and diversity. Existing post-training typically produces a single HDL implementation per specification, lacking awareness of RTL variations needed for different design goals. We propose RTLSeek, a post-training paradigm that applies rule-based Diversity-Oriented Reinforcement Learning to improve RTL correctness and diversity. Our Diversity-Centric Multi-Objective Reward Scheduling integrates expert knowledge with EDA feedback, and a three-stage framework maximizes the utility of limited data. Experiments on the RTLLM benchmark show that RTLSeek surpasses prior methods, with ablation results confirming that encouraging broader design-space exploration improves RTL quality and achieves the principle of "the more generated, the better results." Implementation framework, including the dataset, source code, and model weights, is shown at https://anonymous.4open.science/r/DAC2026ID71-ACB4/.
Problem

Research questions and friction points this paper is trying to address.

RTL generation
LLM
design diversity
functional correctness
HDL
Innovation

Methods, ideas, or system contributions that make the work stand out.

Diversity-Oriented Reinforcement Learning
RTL Generation
Multi-Stage Training
Multi-Objective Reward Scheduling
LLM-based Hardware Design
🔎 Similar Papers
No similar papers found.
X
Xinyu Zhang
State Key Lab of Processors, Institute of Computing Technology, Chinese Academy of Sciences; University of Chinese Academy of Sciences
Zhiteng Chao
Zhiteng Chao
SKLP, ICT
computer science
Y
Yonghao Wang
State Key Lab of Processors, Institute of Computing Technology, Chinese Academy of Sciences
Bin Sun
Bin Sun
School of Computer Science & Technology, Beijing Institute of Technology
natural language processopen-domain dialogue generation
T
Tianyun Ma
State Key Lab of Processors, Institute of Computing Technology, Chinese Academy of Sciences; University of Chinese Academy of Sciences
Tianmeng Yang
Tianmeng Yang
Baidu ERNIE, Peking University
LLMRLMachine LearningData Mining
Jianan Mu
Jianan Mu
Institute of Computing Technology, State Key Laboratory of Processors (SKLP), CAS
Design AutomationAccelaretorPrivacy Preserving Computing
J
Jing Justin Ye
State Key Lab of Processors, Institute of Computing Technology, Chinese Academy of Sciences; University of Chinese Academy of Sciences; CASTEST Co., Ltd.
Huawei Li
Huawei Li
Institute of Computing Technology, Chinese Academy of Sciences
computer engineering