CodeV-R1: Reasoning-Enhanced Verilog Generation

📅 2025-05-30
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses three core challenges in natural language (NL)–to–Verilog code generation: the absence of a verifiable training environment, scarcity of high-quality NL–code parallel data, and prohibitively high computational cost of RL-based Verilog generation (RLVR). Methodologically, we introduce (i) a rule-driven test-bench generator enabling automated equivalence checking; (ii) a code–language–code closed-loop data synthesis framework coupled with bidirectional knowledge distillation to alleviate data scarcity; and (iii) DAPO—a novel adaptive-sampling reinforcement learning algorithm—within a two-stage *distill-then-RL* training paradigm. Our CodeV-R1-7B model achieves 68.6% and 72.9% pass@1 on VerilogEval v2 and RTLLM v1.1, respectively, surpassing prior SOTA by 12–20% and matching the performance of 671B-scale models. This significantly advances reliability and efficiency in NL-to-HDL synthesis for EDA applications.

Technology Category

Application Category

📝 Abstract
Large language models (LLMs) trained via reinforcement learning with verifiable reward (RLVR) have achieved breakthroughs on tasks with explicit, automatable verification, such as software programming and mathematical problems. Extending RLVR to electronic design automation (EDA), especially automatically generating hardware description languages (HDLs) like Verilog from natural-language (NL) specifications, however, poses three key challenges: the lack of automated and accurate verification environments, the scarcity of high-quality NL-code pairs, and the prohibitive computation cost of RLVR. To this end, we introduce CodeV-R1, an RLVR framework for training Verilog generation LLMs. First, we develop a rule-based testbench generator that performs robust equivalence checking against golden references. Second, we propose a round-trip data synthesis method that pairs open-source Verilog snippets with LLM-generated NL descriptions, verifies code-NL-code consistency via the generated testbench, and filters out inequivalent examples to yield a high-quality dataset. Third, we employ a two-stage"distill-then-RL"training pipeline: distillation for the cold start of reasoning abilities, followed by adaptive DAPO, our novel RLVR algorithm that can reduce training cost by adaptively adjusting sampling rate. The resulting model, CodeV-R1-7B, achieves 68.6% and 72.9% pass@1 on VerilogEval v2 and RTLLM v1.1, respectively, surpassing prior state-of-the-art by 12~20%, while matching or even exceeding the performance of 671B DeepSeek-R1. We will release our model, training pipeline, and dataset to facilitate research in EDA and LLM communities.
Problem

Research questions and friction points this paper is trying to address.

Automated Verilog generation lacks robust verification environments
Scarcity of high-quality natural-language to Verilog datasets
High computation cost of reinforcement learning with verifiable reward
Innovation

Methods, ideas, or system contributions that make the work stand out.

Rule-based testbench generator for equivalence checking
Round-trip data synthesis for high-quality NL-code pairs
Two-stage distill-then-RL training with adaptive DAPO
🔎 Similar Papers
No similar papers found.
Y
Yaoyu Zhu
SKL of Processors, Institute of Computing Technology, CAS
D
Di Huang
SKL of Processors, Institute of Computing Technology, CAS
H
Hanqi Lyu
SKL of Processors, Institute of Computing Technology, CAS; University of Science and Technology of China
X
Xiaoyun Zhang
SKL of Processors, Institute of Computing Technology, CAS; University of Chinese Academy of Sciences
Chongxiao Li
Chongxiao Li
ICT, CAS
Computer Architecture
W
Wenxuan Shi
SKL of Processors, Institute of Computing Technology, CAS; University of Chinese Academy of Sciences
Y
Yutong Wu
SKL of Processors, Institute of Computing Technology, CAS; University of Chinese Academy of Sciences
Jianan Mu
Jianan Mu
Institute of Computing Technology, State Key Laboratory of Processors (SKLP), CAS
Design AutomationAccelaretorPrivacy Preserving Computing
Jinghua Wang
Jinghua Wang
Harbin Institute of Technology, Shenzhen
Computer VisionMultimodal LearningMachine Learning
Y
Yang Zhao
SKL of Processors, Institute of Computing Technology, CAS; University of Chinese Academy of Sciences
P
Pengwei Jin
SKL of Processors, Institute of Computing Technology, CAS; University of Chinese Academy of Sciences
S
Shuyao Cheng
SKL of Processors, Institute of Computing Technology, CAS
Shengwen Liang
Shengwen Liang
Institute of computing technology, Chinese Academy of Sciences
AcceleratorCognitive SSDSystem
Xishan Zhang
Xishan Zhang
Institute of Computing Technology of the Chinese Academy of Sciences
R
Rui Zhang
SKL of Processors, Institute of Computing Technology, CAS
Z
Zidong Du
SKL of Processors, Institute of Computing Technology, CAS
Q
Qi Guo
SKL of Processors, Institute of Computing Technology, CAS
X
Xing Hu
SKL of Processors, Institute of Computing Technology, CAS
Yunji Chen
Yunji Chen
Institute of Computing Technology, Chinese Academy of Sciences
processor architecturemicroarchitecturemachine learning