π€ AI Summary
Natural language-to-Verilog synthesis remains challenged by functional correctness under data-scarce conditions. This paper proposes Abstraction-of-Thought (AoT), a training-free, inference-time prompting framework that introduces a task-driven, three-layer abstraction paradigm: (1) pattern classification, (2) hardware-specific structured intermediate representation (IR), and (3) line-level traceable pseudocode. AoT explicitly decouples functional decomposition from syntactic generation, mitigating semantic distortion inherent in end-to-end direct translation. Compatible with arbitrary black-box large language models (LLMs), AoT requires no model fine-tuningβonly inference-time optimization. On the VerilogEval benchmark, AoT achieves substantial gains in functional correctness over Chain-of-Thought, Tree-of-Thought, and other baselines, while reducing generated token count by 1.8Γβ5.2Γ. Notably, it significantly enhances the hardware-oriented synthesis capability of general-purpose LLMs such as GPT-4o.
π Abstract
Large language models (LLMs) have achieved impressive proficiency on logic and programming tasks, often rivaling expert-level performance. However, generating functionally correct hardware description language (HDL) code from natural language specifications remains challenging, primarily in data-scarce domains. Therefore, we present Abstraction-of-Thought (AoT) - a training-free, inference-only prompting framework to mitigate misinterpretations and reasoning pitfalls of LLMs through a series of task-based abstractions within the prompting procedure, assisting in the transition from high-level to low-level representations of hardware. Furthermore, AoT consists of the following stages: (1) an LLM-based classification of hardware design patterns, (2) a structured intermediate representation (IR) to separate functional decomposition from code syntax, and (3) a line-by-line pseudocode solution enabling a more direct mapping to the final Verilog implementation. Experimental results on the VerilogEval benchmark depict that AoT demonstrates improvements in functionality when applied to large non-reasoning models (such as GPT-4o), outperforming all baseline techniques (including 1-shot, Chain-of-Thought, and Tree-of-Thought) while significantly reducing the generated tokens by 1.8-5.2x compared to popular Tree-of-Thought prompting.