VerilogMonkey: Exploring Parallel Scaling for Automated Verilog Code Generation with LLMs

📅 2025-09-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study investigates the impact of parallel scaling on large language models’ (LLMs) performance in automated Verilog code generation. Addressing the limited functional correctness of LLM-generated hardware designs—attributed to output randomness and single-sample decoding bias—we propose a lightweight, fine-tuning-free, post-training-free parallel sampling method that generates multiple candidate solutions concurrently. We systematically evaluate this approach across mainstream open-weight LLMs (e.g., Llama-3, Qwen2) and established benchmarks (VerilogEval, HDLBits), demonstrating a strong positive correlation between parallel scale (up to hundreds of samples) and generation quality. Crucially, we identify and empirically validate the mechanism by which statistical aggregation over parallel samples suppresses stochasticity and improves functional correctness. Experiments show that our method achieves substantial gains under controllable latency and computational cost, outperforming state-of-the-art LLM-based Verilog generators—with up to a 27.4% absolute improvement in functional correctness.

Technology Category

Application Category

📝 Abstract
We present VerilogMonkey, an empirical study of parallel scaling for the under-explored task of automated Verilog generation. Parallel scaling improves LLM performance by sampling many outputs in parallel. Across multiple benchmarks and mainstream LLMs, we find that scaling to hundreds of samples is cost-effective in both time and money and, even without any additional enhancements such as post-training or agentic methods, surpasses prior results on LLM-based Verilog generation. We further dissect why parallel scaling delivers these gains and show how output randomness in LLMs affects its effectiveness.
Problem

Research questions and friction points this paper is trying to address.

Exploring parallel scaling for automated Verilog code generation
Improving LLM performance through parallel output sampling
Analyzing how output randomness affects Verilog generation effectiveness
Innovation

Methods, ideas, or system contributions that make the work stand out.

Parallel scaling samples many LLM outputs
Cost-effective scaling to hundreds of samples
Analyzes output randomness impact on effectiveness
J
Juxin Niu
Department of Computer Science, City University of Hong Kong, Hong Kong SAR
Y
Yuxin Du
Department of Computer Science, City University of Hong Kong, Hong Kong SAR
D
Dan Niu
School of Automation, Southeast University, China
X
Xi Wang
National Center of Technology Innovation for EDA, China School of Integrated Circuits, Southeast University, China
Z
Zhe Jiang
National Center of Technology Innovation for EDA, China School of Integrated Circuits, Southeast University, China
Nan Guan
Nan Guan
City University of Hong Kong
Cyber-Physical systemsEmbedded systemsReal-time systems