🤖 AI Summary
Verification consumes nearly 70% of total chip development effort; UVM testbench construction heavily relies on expert knowledge and suffers from excessive manual coding, cumbersome EDA toolchain operations, and difficulty in stimulus generation. This paper proposes the first LLM-driven, closed-loop UVM testbench generation framework, integrating RTL parsing, coverage-feedback-guided iterative optimization, and automatic test stimulus synthesis to enable end-to-end generation of executable verification environments from RTL designs. Evaluated on thousand-line RTL benchmarks, our framework reduces testbench construction time by up to 2× compared to experienced engineers, achieving code and functional coverage of 87.44% and 89.58%, respectively—improving upon the state-of-the-art by 20.96% and 23.51%. These results demonstrate a significant departure from traditional, human-centric verification paradigms.
📝 Abstract
Verification presents a major bottleneck in Integrated Circuit (IC) development, consuming nearly 70% of the total development effort. While the Universal Verification Methodology (UVM) is widely used in industry to improve verification efficiency through structured and reusable testbenches, constructing these testbenches and generating sufficient stimuli remain challenging. These challenges arise from the considerable manual coding effort required, repetitive manual execution of multiple EDA tools, and the need for in-depth domain expertise to navigate complex designs.Here, we present UVM^2, an automated verification framework that leverages Large Language Models (LLMs) to generate UVM testbenches and iteratively refine them using coverage feedback, significantly reducing manual effort while maintaining rigorous verification standards.To evaluate UVM^2, we introduce a benchmark suite comprising Register Transfer Level (RTL) designs of up to 1.6K lines of code.The results show that UVM^2 reduces testbench setup time by up to UVM^2 compared to experienced engineers, and achieve average code and function coverage of 87.44% and 89.58%, outperforming state-of-the-art solutions by 20.96% and 23.51%, respectively.