🤖 AI Summary
High-level synthesis (HLS) for FPGAs faces challenges in timing closure and architecture-specific optimization, which heavily rely on manual pragma insertion and lack automated, intelligent support. Method: This paper proposes TimelyHLS—a novel framework that integrates large language models (LLMs) with retrieval-augmented generation (RAG) into the HLS flow. It constructs a timing-aware, structured FPGA knowledge base and leverages synthesis log feedback alongside closed-loop evaluation using commercial toolchains to enable automatic, iterative inference and refinement of architecture-specific pragmas. Contribution/Results: Evaluated across ten FPGA platforms, TimelyHLS achieves up to 3.85× speedup for matrix multiplication and 57% register reduction for Viterbi decoding—while guaranteeing functional correctness and timing convergence—significantly reducing human tuning effort.
📝 Abstract
Achieving timing closure and design-specific optimizations in FPGA-targeted High-Level Synthesis (HLS) remains a significant challenge due to the complex interaction between architectural constraints, resource utilization, and the absence of automated support for platform-specific pragmas. In this work, we propose TimelyHLS, a novel framework integrating Large Language Models (LLMs) with Retrieval-Augmented Generation (RAG) to automatically generate and iteratively refine HLS code optimized for FPGA-specific timing and performance requirements. TimelyHLS is driven by a structured architectural knowledge base containing FPGA-specific features, synthesis directives, and pragma templates. Given a kernel, TimelyHLS generates HLS code annotated with both timing-critical and design-specific pragmas. The synthesized RTL is then evaluated using commercial toolchains, and simulation correctness is verified against reference outputs via custom testbenches. TimelyHLS iteratively incorporates synthesis logs and performance reports into the LLM engine for refinement in the presence of functional discrepancies. Experimental results across 10 FPGA architectures and diverse benchmarks show that TimelyHLS reduces the need for manual tuning by up to 70%, while achieving up to 4x latency speedup (e.g., 3.85x for Matrix Multiplication, 3.7x for Bitonic Sort) and over 50% area savings in certain cases (e.g., 57% FF reduction in Viterbi). TimelyHLS consistently achieves timing closure and functional correctness across platforms, highlighting the effectiveness of LLM-driven, architecture-aware synthesis in automating FPGA design.