MetRex: A Benchmark for Verilog Code Metric Reasoning Using LLMs

📅 2024-11-05

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

217K/year

🤖 AI Summary

This work addresses the challenge of rapidly and accurately predicting post-synthesis performance metrics—area, delay, and static power—for Verilog hardware designs. To this end, we introduce MetRex, the first domain-specific benchmark for EDA-driven performance prediction, comprising 25,868 design–metric pairs and a hardware-aware chain-of-thought (CoT) prompting template. We present the first systematic evaluation of large language models (LLMs) for quantitative EDA analysis, proposing an end-to-end regression framework that eliminates dependence on commercial EDA tools. Our approach integrates Verilog parsing, supervised fine-tuning (SFT), and hardware-customized CoT prompting. Experimental results demonstrate that SFT improves mean prediction accuracy by 25.3%–37.0%; compared to conventional regression models, our method achieves a 17.4% higher coverage within 5% error tolerance and accelerates inference by 1.7×.

Technology Category

Application Category

📝 Abstract

Large Language Models (LLMs) have been applied to various hardware design tasks, including Verilog code generation, EDA tool scripting, and RTL bug fixing. Despite this extensive exploration, LLMs are yet to be used for the task of post-synthesis metric reasoning and estimation of HDL designs. In this paper, we assess the ability of LLMs to reason about post-synthesis metrics of Verilog designs. We introduce MetRex, a large-scale dataset comprising 25,868 Verilog HDL designs and their corresponding post-synthesis metrics, namely area, delay, and static power. MetRex incorporates a Chain of Thought (CoT) template to enhance LLMs' reasoning about these metrics. Extensive experiments show that Supervised Fine-Tuning (SFT) boosts the LLM's reasoning capabilities on average by 37.0%, 25.3%, and 25.7% on the area, delay, and static power, respectively. While SFT improves performance on our benchmark, it remains far from achieving optimal results, especially on complex problems. Comparing to state-of-the-art regression models, our approach delivers accurate post-synthesis predictions for 17.4% more designs (within a 5% error margin), in addition to offering a 1.7x speedup by eliminating the need for pre-processing. This work lays the groundwork for advancing LLM-based Verilog code metric reasoning.

Problem

Research questions and friction points this paper is trying to address.

Hardware Description Language

Performance Prediction

Verilog

Innovation

Methods, ideas, or system contributions that make the work stand out.

MetRex Dataset

Thought Chain Method

Supervised Fine-tuning

🔎 Similar Papers

No similar papers found.