🤖 AI Summary
To address the low functional correctness of Verilog code generated by large language models (LLMs), this paper proposes a three-stage focused reasoning framework. First, density-guided filtering identifies critical design decision points. Second, simulation feedback and inconsistency mining jointly localize high-risk logical components. Third, self-consistency clustering and reasoning-enhanced prompting refine and select optimal candidates. The framework integrates syntactic retrying, automated testbench simulation, and prompt optimization. Evaluated on the VerilogEval-Human benchmark, it significantly improves pass@1 accuracy across multiple leading reasoning-oriented LLMs (average +12.3%). Notably, it is the first approach to explicitly model and focus reasoning on critical decision paths in complex hardware design tasks. By enabling interpretable and verifiable synthesis, the framework establishes a novel paradigm for LLM-driven hardware code generation.
📝 Abstract
Large Language Models (LLMs) have shown impressive potential in generating Verilog codes, but ensuring functional correctness remains a challenge. Existing approaches often rely on self-consistency or simulation feedback to select the best candidate, but they miss opportunities to focus LLM reasoning on the most informative parts of the design. We propose VFocus, a three-stage framework that enhances Verilog generation by sharpening the focus of LLM reasoning onto critical decision points in the code generation process. In the extbf{pre-ranking stage}, VFocus generates multiple code candidates through LLM prompting, retries for syntactically valid outputs, and introduces a extit{Density-guided Filtering} to retain candidates that fall within the"reasoning sweet spot"for functional correctness. In the extbf{ranking stage}, we simulate each code candidate using an automatically generated testbench and apply self-consistency-based clustering to identify the most consistent outputs. Finally, in the extbf{post-ranking refinement stage}, VFocus performs inconsistency mining on top-ranked candidates and invokes reasoning-augmented LLM prompts for candidate refinement. Experiments on the VerilogEval-Human benchmark show that VFocus significantly improves the pass@1 correctness across multiple reasoning LLMs, demonstrating its effectiveness in enhancing Verilog generation for complex hardware design tasks.