🤖 AI Summary
This work addresses the limitation of existing large language model (LLM)-driven high-level synthesis (HLS) approaches, which predominantly prioritize functional correctness while neglecting quality-of-results (QoR) optimization. To bridge this gap, the authors propose HLS-Seek, the first QoR-aware framework for natural language to HLS code generation. HLS-Seek integrates a high-fidelity proxy comparative reward model with an uncertainty-aware Monte Carlo Dropout switching mechanism, forming a self-evolving reward system that effectively mitigates reward hacking and eliminates the need for costly real synthesis loops. Evaluated on HLS-eval using a 7B-parameter LLM, HLS-Seek achieves 81.5% syntactic correctness and 81.4% Func@5, trains 8.5× faster than reinforcement learning with real rewards, and attains the lowest latency in 16 out of 30 kernels, with 9 kernels demonstrating Pareto superiority over baseline methods.
📝 Abstract
High-Level Synthesis (HLS) compiles algorithmic C/C++ descriptions into hardware, with Quality of Results (QoR) -- latency and resource utilization -- critically governed by pragma configurations and code structure. Existing LLM-based HLS approaches train for functional correctness but ignore QoR entirely. We observe that reinforcement learning (RL) for HLS does not require absolute synthesis results -- only relative comparisons between candidates. Based on this insight, we propose \textbf{HLS-Seek}, a QoR-aware NL-to-HLS framework that replaces expensive synthesis-in-the-loop RL with a comparative proxy reward model achieving 99.53\% Pareto-dominance accuracy. To prevent reward hacking, we introduce \textit{uncertainty-aware Monte Carlo (MC) dropout switching} that selectively invokes real Vitis HLS synthesis for low-confidence candidates and online updates the proxy, creating a self-improving reward system. HLS-Seek achieves 81.5\% syntax correctness pass@1 and 81.4\% Func@5 on HLS-eval with only 7B parameters, surpassing GPT-5.1 and other frontier models while achieving 8.5$\times$ faster training than real-reward RL. On QoR evaluation, HLS-Seek achieves the lowest latency on 16/30 kernels and Pareto-dominates HLS-specific baselines on 9 kernels.