🤖 AI Summary
Existing LLM-based approaches face significant bottlenecks in structural modeling and functional correctness when synthesizing large-scale, deeply hierarchical Verilog RTL designs with extensive module instantiation. This paper introduces the first end-to-end LLM-driven framework for RTL generation. Its core innovations are: (1) a two-stage generation mechanism incorporating a circuit-aware intermediate representation; (2) a rule-guided semantic alignment method ensuring syntactic validity and behavioral consistency; and (3) a domain-customized Retrieval-Augmented Generation (RAG) technique tailored for digital circuits. Evaluated on a benchmark of 55 real-world industrial RTL designs using models including Qwen2.5-32B, our framework achieves functional correctness rates 14.6% higher than CodeV and 22.2% higher than RTLCoder. Notably, even under lightweight configurations, it matches the performance of GPT-3.5 and DeepSeek-V3.
📝 Abstract
Recent advances have demonstrated the promising capabilities of large language models (LLMs) in generating register-transfer level (RTL) code, such as Verilog. However, existing LLM-based frameworks still face significant challenges in accurately handling the complexity of real-world RTL designs, particularly those that are large-scale and involve multi-level module instantiations. To address this issue, we present ComplexVCoder, an open-source LLM-driven framework that enhances both the generation quality and efficiency of complex Verilog code. Specifically, we introduce a two-stage generation mechanism, which leverages an intermediate representation to enable a more accurate and structured transition from natural language descriptions to intricate Verilog designs. In addition, we introduce a rule-based alignment method and a domain-specific retrieval-augmented generation (RAG) to further improve the correctness of the synthesized code by incorporating relevant design knowledge during generation. To evaluate our approach, we construct a comprehensive dataset comprising 55 complex Verilog designs derived from real-world implementations. We also release an open-source benchmark suite for systematically assessing the quality of auto-generated RTL code together with the ComplexVCoder framework. Experimental results show that ComplexVCoder outperforms SOTA frameworks such as CodeV and RTLCoder by 14.6% and 22.2%, respectively, in terms of function correctness on complex Verilog benchmarks. Furthermore, ComplexVcoder achieves comparable generation performances in terms of functionality correctness using a lightweight 32B model (Qwen2.5), rivaling larger-scale models such as GPT-3.5 and DeepSeek-V3.