🤖 AI Summary
Current large language models (LLMs) often generate syntactically correct but functionally incorrect Verilog code due to low-quality training data. To address this, we propose AutoVeriFix—a Python-assisted, two-stage framework. First, it automatically synthesizes a high-level, executable Python reference model and generates corresponding automated test stimuli. Second, it identifies functional discrepancies by comparing RTL simulation outputs against the Python model’s behavior, then iteratively guides the LLM to refine its Verilog generation. Its core innovation lies in adopting lightweight, executable Python models as functional golden references, enabling a closed-loop simulation-driven feedback mechanism—overcoming the limitation of conventional approaches that rely solely on syntactic validation. Experiments across diverse digital circuit design tasks demonstrate that AutoVeriFix significantly improves functional correctness, achieving an average 23.6% absolute gain over state-of-the-art methods, while maintaining high reliability and engineering practicality.
📝 Abstract
Large language models (LLMs) have demonstrated impressive capabilities in generating software code for high-level programming languages such as Python and C++. However, their application to hardware description languages, such as Verilog, is challenging due to the scarcity of high-quality training data. Current approaches to Verilog code generation using LLMs often focus on syntactic correctness, resulting in code with functional errors. To address these challenges, we present AutoVeriFix, a novel Python-assisted two-stage framework designed to enhance the functional correctness of LLM-generated Verilog code. In the first stage, LLMs are employed to generate high-level Python reference models that define the intended circuit behavior. In the second stage, these Python models facilitate the creation of automated tests that guide the generation of Verilog RTL implementations. Simulation discrepancies between the reference model and the Verilog code are iteratively used to identify and correct errors, thereby improving the functional accuracy and reliability of the LLM-generated Verilog code. Experimental results demonstrate that our approach significantly outperforms existing state-of-the-art methods in improving the functional correctness of generated Verilog code.