SAGE-HLS: Syntax-Aware AST-Guided LLM for High-Level Synthesis Code Generation

📅 2025-08-05

📈 Citations: 0

✨ Influential: 0

career value

216K/year

🤖 AI Summary

To address the scarcity of high-quality, publicly available code datasets in high-level synthesis (HLS), which hinders large language model (LLM) adoption and design-space exploration, this work introduces VerilogBilingual—a first-of-its-kind, 16.7K-sample bilingual dataset pairing Verilog with equivalent C/C++ HLS implementations. We propose an Abstract Syntax Tree (AST)-guided, syntax-aware instruction fine-tuning methodology and establish the first cross-lingual code migration model from Verilog to C/C++. Integrated with retrieval-augmented generation (RAG) and the semi-automated evaluation framework VerilogEval, our approach achieves ~100% synthesizability and 75% functional correctness on QwenCoder-7B—substantially outperforming baselines. Key contributions include: (1) the first HLS-specific bilingual code dataset; (2) an AST-driven fine-tuning paradigm jointly optimizing syntactic fidelity and functional correctness; and (3) an end-to-end verifiable HLS code generation framework.

Technology Category

Application Category

📝 Abstract

In today's rapidly evolving field of electronic design automation (EDA), the complexity of hardware designs is increasing, necessitating more sophisticated automation solutions. High-level synthesis (HLS), as a pivotal solution, automates hardware designs from high-level abstractions (e.g., C/C++). However, it faces significant challenges, particularly in design space exploration and optimization. While large language models (LLMs) have shown notable capabilities in code generation, their application to HLS has been limited due to the scarcity of (publicly) available HLS code datasets. Hence, research in this domain has primarily focused on techniques such as prompt engineering and retrieval-augmented generation (RAG). To overcome this limitation, this paper introduces SAGE-HLS, the first-of-its-kind fine-tuned LLM specifically for HLS code generation. Our method includes three key advancements: (i) We implement Verilog-to-C/C++ porting, converting verified and synthesizable Verilog codes into corresponding C, creating a dataset of 16.7K HLS codes; (ii) We implement a fine-tuning strategy, which is based on instruction prompting to code generation guided by abstract syntax tree (AST); (iii) We develop a semi-automated evaluation framework using VerilogEval to assess the functionality of the generated HLS code. Our experiments show that SAGE-HLS, fined-tuned on the QwenCoder (2.5) 7B model, achieves a near 100% success rate in code synthesizability and a 75% success rate in functional correctness.

Problem

Research questions and friction points this paper is trying to address.

Addresses HLS code generation challenges using LLMs

Overcomes scarcity of publicly available HLS datasets

Improves synthesizability and functional correctness of HLS code

Innovation

Methods, ideas, or system contributions that make the work stand out.

Verilog-to-C/C++ porting for dataset creation

AST-guided fine-tuning with instruction prompting

Semi-automated evaluation using VerilogEval framework

🔎 Similar Papers

No similar papers found.