VeriReason: Reinforcement Learning with Testbench Feedback for Reasoning-Enhanced Verilog Generation

📅 2025-05-17

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Large language models (LLMs) face significant challenges in generating synthesizable RTL code—namely, scarcity of high-quality training data, poor alignment between formal specifications and generated Verilog, lack of built-in verification, and weak generalization to unseen circuit architectures. Method: This paper proposes a specification-driven, verifiable Verilog generation framework. It pioneers the integration of explicit reasoning with GRPO (Generalized Reinforcement Learning via Policy Optimization), incorporating testbench execution feedback and structure-aware reward modeling. Key innovations include self-verifying decoding and autonomous correction mechanisms. The technical pipeline comprises supervised fine-tuning (SFT), GRPO-based optimization, testbench-driven reward design, and structural heuristic reward functions. Results: On the VerilogEval benchmark, our framework achieves 83.1% functional correctness—surpassing GPT-4 Turbo—and improves first-attempt correctness by 2.8×. It demonstrates strong generalization to novel, unseen circuit designs, validating both correctness and robustness.

Technology Category

Application Category

📝 Abstract

Automating Register Transfer Level (RTL) code generation using Large Language Models (LLMs) offers substantial promise for streamlining digital circuit design and reducing human effort. However, current LLM-based approaches face significant challenges with training data scarcity, poor specification-code alignment, lack of verification mechanisms, and balancing generalization with specialization. Inspired by DeepSeek-R1, we introduce VeriReason, a framework integrating supervised fine-tuning with Guided Reward Proximal Optimization (GRPO) reinforcement learning for RTL generation. Using curated training examples and a feedback-driven reward model, VeriReason combines testbench evaluations with structural heuristics while embedding self-checking capabilities for autonomous error correction. On the VerilogEval Benchmark, VeriReason delivers significant improvements: achieving 83.1% functional correctness on the VerilogEval Machine benchmark, substantially outperforming both comparable-sized models and much larger commercial systems like GPT-4 Turbo. Additionally, our approach demonstrates up to a 2.8X increase in first-attempt functional correctness compared to baseline methods and exhibits robust generalization to unseen designs. To our knowledge, VeriReason represents the first system to successfully integrate explicit reasoning capabilities with reinforcement learning for Verilog generation, establishing a new state-of-the-art for automated RTL synthesis. The models and datasets are available at: https://huggingface.co/collections/AI4EDA-CASE Code is Available at: https://github.com/NellyW8/VeriReason

Problem

Research questions and friction points this paper is trying to address.

Automating RTL code generation with LLMs faces data scarcity and alignment issues

Current methods lack verification and balance generalization with specialization poorly

VeriReason integrates reasoning with RL for improved Verilog generation accuracy

Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates GRPO reinforcement learning for RTL generation

Uses testbench feedback for autonomous error correction

Combines supervised fine-tuning with structural heuristics

🔎 Similar Papers

No similar papers found.

Authors to Follow