🤖 AI Summary
To address the challenge of simultaneously ensuring functional correctness and optimizing power–performance–area (PPA) metrics in large language model (LLM)-generated RTL code, this paper proposes a globally optimized framework integrating LLMs with evolutionary computation. Our method employs a novel dual-population genetic algorithm that decouples error correction from PPA optimization, enabling parallel, targeted refinement. Furthermore, we introduce a dynamic feedback evaluation mechanism coupled with an adaptive prompt-scheduling strategy, overcoming the limitations of conventional iterative local search. Evaluated on VerilogEval and RTLLM benchmarks, our approach improves the initial pass rates of multiple LLMs by up to 24.0 percentage points; notably, DeepSeek-V3 achieves a final pass rate of 95.5%. Crucially, synthesized designs exhibit superior PPA characteristics compared to human-written reference implementations.
📝 Abstract
Large Language Models (LLMs) are used for Register-Transfer Level (RTL) code generation, but they face two main challenges: functional correctness and Power, Performance, and Area (PPA) optimization. Iterative, feedback-based methods partially address these, but they are limited to local search, hindering the discovery of a global optimum. This paper introduces REvolution, a framework that combines Evolutionary Computation (EC) with LLMs for automatic RTL generation and optimization. REvolution evolves a population of candidates in parallel, each defined by a design strategy, RTL implementation, and evaluation feedback. The framework includes a dual-population algorithm that divides candidates into Fail and Success groups for bug fixing and PPA optimization, respectively. An adaptive mechanism further improves search efficiency by dynamically adjusting the selection probability of each prompt strategy according to its success rate. Experiments on the VerilogEval and RTLLM benchmarks show that REvolution increased the initial pass rate of various LLMs by up to 24.0 percentage points. The DeepSeek-V3 model achieved a final pass rate of 95.5%, comparable to state-of-the-art results, without the need for separate training or domain-specific tools. Additionally, the generated RTL designs showed significant PPA improvements over reference designs. This work introduces a new RTL design approach by combining LLMs' generative capabilities with EC's broad search power, overcoming the local-search limitations of previous methods.