RTLSeek: Boosting the LLM-Based RTL Generation with Multi-Stage Diversity-Oriented Reinforcement Learning

📅 2026-03-29

📈 Citations: 0

✨ Influential: 0

career value

207K/year

🤖 AI Summary

This work addresses the limitations of current large language models (LLMs) in generating register-transfer level (RTL) code, which suffer from insufficient functional correctness and limited design diversity due to the scarcity of high-quality, verifiable training data. To overcome these challenges, the authors propose RTLSeek, a multi-stage, diversity-oriented reinforcement learning post-training framework that innovatively integrates expert-defined rules, closed-loop feedback from electronic design automation (EDA) tools, and a multi-objective reward scheduling mechanism. Through a three-phase training process, RTLSeek synergistically enhances both functional correctness and structural diversity of generated RTL code even under data-constrained conditions. Experimental results on the RTLLM benchmark demonstrate that RTLSeek significantly outperforms existing approaches, and ablation studies confirm that enhanced exploration of the design space effectively improves both the quality and diversity of synthesized RTL.

Technology Category

Application Category

📝 Abstract

Register Transfer Level (RTL) design translates high-level specifications into hardware using HDLs such as Verilog. Although LLM-based RTL generation is promising, the scarcity of functionally verifiable high-quality data limits both accuracy and diversity. Existing post-training typically produces a single HDL implementation per specification, lacking awareness of RTL variations needed for different design goals. We propose RTLSeek, a post-training paradigm that applies rule-based Diversity-Oriented Reinforcement Learning to improve RTL correctness and diversity. Our Diversity-Centric Multi-Objective Reward Scheduling integrates expert knowledge with EDA feedback, and a three-stage framework maximizes the utility of limited data. Experiments on the RTLLM benchmark show that RTLSeek surpasses prior methods, with ablation results confirming that encouraging broader design-space exploration improves RTL quality and achieves the principle of "the more generated, the better results." Implementation framework, including the dataset, source code, and model weights, is shown at https://anonymous.4open.science/r/DAC2026ID71-ACB4/.

Problem

Research questions and friction points this paper is trying to address.

RTL generation

LLM

design diversity

functional correctness

HDL

Innovation

Methods, ideas, or system contributions that make the work stand out.

Diversity-Oriented Reinforcement Learning

RTL Generation

Multi-Stage Training