RealBench: Benchmarking Verilog Generation Models with Real-World IP Designs

📅 2025-07-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing Verilog generation benchmarks are overly simplified and fail to capture the complexity of real-world IP-level hardware design—such as ambiguous specifications and lax verification practices. Method: We introduce VeraBench, the first benchmark targeting realistic hardware design workflows. It comprises 12 open-source IP cores, multimodal specifications (textual descriptions, waveform diagrams, and interface definitions), and a rigorous verification environment requiring 100% line-coverage simulation plus formal verification. VeraBench supports dual-granularity evaluation at both module and system levels. Contribution/Results: Experiments reveal fundamental limitations of current LLMs in end-to-end hardware generation: the state-of-the-art model o1-preview achieves only 13.3% pass@1 on module-level tasks and fails all system-level tasks. VeraBench establishes a reproducible, scalable, and industrially relevant evaluation standard for LLM-driven hardware design, providing concrete directions for future improvement.

Technology Category

Application Category

📝 Abstract
The automatic generation of Verilog code using Large Language Models (LLMs) has garnered significant interest in hardware design automation. However, existing benchmarks for evaluating LLMs in Verilog generation fall short in replicating real-world design workflows due to their designs' simplicity, inadequate design specifications, and less rigorous verification environments. To address these limitations, we present RealBench, the first benchmark aiming at real-world IP-level Verilog generation tasks. RealBench features complex, structured, real-world open-source IP designs, multi-modal and formatted design specifications, and rigorous verification environments, including 100% line coverage testbenches and a formal checker. It supports both module-level and system-level tasks, enabling comprehensive assessments of LLM capabilities. Evaluations on various LLMs and agents reveal that even one of the best-performing LLMs, o1-preview, achieves only a 13.3% pass@1 on module-level tasks and 0% on system-level tasks, highlighting the need for stronger Verilog generation models in the future. The benchmark is open-sourced at https://github.com/IPRC-DIP/RealBench.
Problem

Research questions and friction points this paper is trying to address.

Benchmarking Verilog generation models for real-world IP designs
Addressing simplicity and inadequate specs in existing benchmarks
Evaluating LLMs on complex module and system-level tasks
Innovation

Methods, ideas, or system contributions that make the work stand out.

RealBench benchmark for real-world Verilog generation
Multi-modal formatted design specifications
Rigorous verification with full coverage testbenches
P
Pengwei Jin
State Key Lab of Processors, Institute of Computing Technology, CAS , Beijing, China
D
Di Huang
State Key Lab of Processors, Institute of Computing Technology, CAS , Beijing, China
Chongxiao Li
Chongxiao Li
ICT, CAS
Computer Architecture
S
Shuyao Cheng
State Key Lab of Processors, Institute of Computing Technology, CAS , Beijing, China
Y
Yang Zhao
State Key Lab of Processors, Institute of Computing Technology, CAS , Beijing, China; University of Chinese Academy of Sciences , Beijing, China; Cambricon Technologies
Xinyao Zheng
Xinyao Zheng
University of California Riverside
J
Jiaguo Zhu
State Key Lab of Processors, Institute of Computing Technology, CAS , Beijing, China; University of Science and Technology of China , Hefei, China; Cambricon Technologies
S
Shuyi Xing
State Key Lab of Processors, Institute of Computing Technology, CAS , Beijing, China; University of Science and Technology of China , Hefei, China; Cambricon Technologies
Bohan Dou
Bohan Dou
中国科学技术大学
LLM
R
Rui Zhang
State Key Lab of Processors, Institute of Computing Technology, CAS , Beijing, China
Z
Zidong Du
State Key Lab of Processors, Institute of Computing Technology, CAS , Beijing, China; Shanghai Innovation Center for Processor Technologies , Shanghai, China
Q
Qi Guo
State Key Lab of Processors, Institute of Computing Technology, CAS , Beijing, China
X
Xing Hu
State Key Lab of Processors, Institute of Computing Technology, CAS , Beijing, China; Shanghai Innovation Center for Processor Technologies , Shanghai, China