🤖 AI Summary
Large language models (LLMs) suffer from severe hallucinations and functional errors when generating hardware description language (HDL) code, primarily due to scarce domain-specific training data. To address this without fine-tuning, we propose a lightweight, inference-time framework. Our method introduces two key innovations: (1) HDL-aware self-verifying chain-of-thought prompting, which integrates domain knowledge to guide stepwise reasoning and iterative self-validation; and (2) a two-stage heterogeneous retrieval-augmented generation (RAG) mechanism that jointly performs critical component extraction and sequential re-ranking to balance syntactic robustness and semantic relevance. Evaluated on the RTLLM2.0 benchmark, our approach significantly reduces hallucination rates while substantially improving both syntactic and functional correctness. Remarkably, it achieves state-of-the-art performance without any parameter updates or fine-tuning.
📝 Abstract
Recent advances in large language models (LLMs) have demonstrated remarkable capabilities in code generation tasks. However, when applied to hardware description languages (HDL), these models exhibit significant limitations due to data scarcity, resulting in hallucinations and incorrect code generation. To address these challenges, we propose HDLCoRe, a training-free framework that enhances LLMs' HDL generation capabilities through prompt engineering techniques and retrieval-augmented generation (RAG). Our approach consists of two main components: (1) an HDL-aware Chain-of-Thought (CoT) prompting technique with self-verification that classifies tasks by complexity and type, incorporates domain-specific knowledge, and guides LLMs through step-by-step self-simulation for error correction; and (2) a two-stage heterogeneous RAG system that addresses formatting inconsistencies through key component extraction and efficiently retrieves relevant HDL examples through sequential filtering and re-ranking. HDLCoRe eliminates the need for model fine-tuning while substantially improving LLMs' HDL generation capabilities. Experimental results demonstrate that our framework achieves superior performance on the RTLLM2.0 benchmark, significantly reducing hallucinations and improving both syntactic and functional correctness.