🤖 AI Summary
This work addresses the challenge that Verilog code generated by large language models (LLMs) often suffers from syntactic or structural errors, leading to compilation failures or functional inaccuracies that hinder reliable hardware design. To mitigate this issue, the authors propose a novel approach integrating contrastive learning with active error screening, featuring a minimal-error data augmentation strategy and a fine-grained discrimination module that combines semantic embeddings with token-level uncertainty estimates to filter out low-confidence candidates during generation. Evaluated on the VerilogEval and RTLLM benchmarks, a fine-tuned 7B-parameter model achieves significant improvements in both compilation success rate and functional correctness over existing open-source, domain-specific, and commercial baselines, demonstrating the effectiveness and practicality of the proposed method.
📝 Abstract
Large Language Models (LLMs) have recently achieved strong performance in software code generation. However, applying them to hardware description languages (HDLs), such as Verilog, remains challenging because high-quality training data are relatively scarce. In practice, LLM-generated Verilog often contains syntactic or structural errors that either cause compilation failures or produce functionally incorrect designs, which limit its reliability in hardware design workflows.
In this work, we propose VerilogCL, an integrated framework that enhances Verilog code generation by explicitly learning the boundary between correct and erroneous RTL through contrastive learning and proactive error screening. Our approach introduces minimal-error data augmentation, generating paired training samples of correct RTL and minimally perturbed erroneous RTL to teach the model to recognize fine-grained distinctions between correct and erroneous code. We then apply contrastive learning to learn a clearer validity boundary in the representation space, improving the separation between correct and erroneous RTL code. In addition, we introduce a proactive screening module that combines semantic embeddings with token-level uncertainty features to filter low-confidence candidates during generation. Experiments on public benchmarks, including VerilogEval and RTLLM, show that our 7B-parameter model outperforms the evaluated open-source, Verilog-specialized, and commercial baselines in both compilation success rate and functional correctness.