AutoVeriFix: Automatically Correcting Errors and Enhancing Functional Correctness in LLM-Generated Verilog Code

📅 2025-09-10

📈 Citations: 0

✨ Influential: 0

career value

181K/year

🤖 AI Summary

Current large language models (LLMs) often generate syntactically correct but functionally incorrect Verilog code due to low-quality training data. To address this, we propose AutoVeriFix—a Python-assisted, two-stage framework. First, it automatically synthesizes a high-level, executable Python reference model and generates corresponding automated test stimuli. Second, it identifies functional discrepancies by comparing RTL simulation outputs against the Python model’s behavior, then iteratively guides the LLM to refine its Verilog generation. Its core innovation lies in adopting lightweight, executable Python models as functional golden references, enabling a closed-loop simulation-driven feedback mechanism—overcoming the limitation of conventional approaches that rely solely on syntactic validation. Experiments across diverse digital circuit design tasks demonstrate that AutoVeriFix significantly improves functional correctness, achieving an average 23.6% absolute gain over state-of-the-art methods, while maintaining high reliability and engineering practicality.

Technology Category

Application Category

📝 Abstract

Large language models (LLMs) have demonstrated impressive capabilities in generating software code for high-level programming languages such as Python and C++. However, their application to hardware description languages, such as Verilog, is challenging due to the scarcity of high-quality training data. Current approaches to Verilog code generation using LLMs often focus on syntactic correctness, resulting in code with functional errors. To address these challenges, we present AutoVeriFix, a novel Python-assisted two-stage framework designed to enhance the functional correctness of LLM-generated Verilog code. In the first stage, LLMs are employed to generate high-level Python reference models that define the intended circuit behavior. In the second stage, these Python models facilitate the creation of automated tests that guide the generation of Verilog RTL implementations. Simulation discrepancies between the reference model and the Verilog code are iteratively used to identify and correct errors, thereby improving the functional accuracy and reliability of the LLM-generated Verilog code. Experimental results demonstrate that our approach significantly outperforms existing state-of-the-art methods in improving the functional correctness of generated Verilog code.

Problem

Research questions and friction points this paper is trying to address.

Improving functional correctness of LLM-generated Verilog code

Addressing functional errors in hardware description languages

Automatically correcting errors in LLM-generated circuit designs

Innovation

Methods, ideas, or system contributions that make the work stand out.

Generates Python reference models for circuit behavior

Uses automated tests to guide Verilog implementation

Iteratively corrects errors via simulation discrepancies

🔎 Similar Papers

No similar papers found.