Insights from Verification: Training a Verilog Generation LLM with Reinforcement Learning with Testbench Feedback

📅 2025-04-22

📈 Citations: 0

✨ Influential: 0

career value

205K/year

🤖 AI Summary

Functional correctness verification of generated Verilog code suffers from severe scarcity of high-quality, semantically accurate training data. Method: We propose the first end-to-end verification-driven reinforcement learning framework for Verilog synthesis. It tightly integrates VCS-based simulation feedback into both automatic testbench generation and large language model (LLM) training—introducing preference pairs derived directly from functional verification outcomes to precisely align design objectives (functional correctness) with model optimization goals. Technically, the framework unifies automatic testbench generation, VCS compilation/simulation, direct preference optimization (DPO), and LLM fine-tuning. Results: Our method achieves state-of-the-art performance across five major benchmarks—VerilogEval-Machine, VerilogEval-Human, RTLLM v1.1/v2, and VerilogEval v2—with substantial gains in functional correctness rate. All code, datasets, and models are publicly released.

Technology Category

Application Category

📝 Abstract

Large language models (LLMs) have shown strong performance in Verilog generation from natural language description. However, ensuring the functional correctness of the generated code remains a significant challenge. This paper introduces a method that integrates verification insights from testbench into the training of Verilog generation LLMs, aligning the training with the fundamental goal of hardware design: functional correctness. The main obstacle in using LLMs for Verilog code generation is the lack of sufficient functional verification data, particularly testbenches paired with design specifications and code. To address this problem, we introduce an automatic testbench generation pipeline that decomposes the process and uses feedback from the Verilog compiler simulator (VCS) to reduce hallucination and ensure correctness. We then use the testbench to evaluate the generated codes and collect them for further training, where verification insights are introduced. Our method applies reinforcement learning (RL), specifically direct preference optimization (DPO), to align Verilog code generation with functional correctness by training preference pairs based on testbench outcomes. In evaluations on VerilogEval-Machine, VerilogEval-Human, RTLLM v1.1, RTLLM v2, and VerilogEval v2, our approach consistently outperforms state-of-the-art baselines in generating functionally correct Verilog code. We open source all training code, data, and models at https://anonymous.4open.science/r/VeriPrefer-E88B.

Problem

Research questions and friction points this paper is trying to address.

Ensuring functional correctness of LLM-generated Verilog code

Addressing lack of verification data for Verilog generation

Aligning Verilog generation with testbench feedback via RL

Innovation

Methods, ideas, or system contributions that make the work stand out.

Reinforcement learning for Verilog generation

Automatic testbench generation pipeline

Testbench feedback ensures functional correctness

🔎 Similar Papers

Large Language Model for Verilog Generation with Code-Structure-Guided Reinforcement Learning