Iceberg: Enhancing HLS Modeling with Synthetic Data

📅 2025-07-14

📈 Citations: 0

✨ Influential: 0

career value

210K/year

🤖 AI Summary

Deep learning models exhibit poor generalization in high-level synthesis (HLS) hardware design prediction and struggle to transfer across diverse design configurations. To address this, we propose Iceberg—a novel pretraining paradigm integrating large language model (LLM)-based program generation, weakly supervised label expansion, and in-context meta-learning. Iceberg constructs training signals grounded in synthetic data that jointly satisfy realism and proximity constraints, eliminating the need for manual annotation. This enables significantly improved adaptability to unseen designs. Experiments demonstrate that Iceberg achieves an 86.4% gain in geometric mean accuracy under few-shot transfer across six real-world HLS tasks. Furthermore, in offline design-space exploration on two realistic benchmarks, it improves search efficiency by 2.47× and 1.12×, respectively. Iceberg establishes a scalable, low-dependency paradigm for intelligent HLS optimization—requiring no human-labeled data while enabling robust cross-configuration generalization.

Technology Category

Application Category

📝 Abstract

Deep learning-based prediction models for High-Level Synthesis (HLS) of hardware designs often struggle to generalize. In this paper, we study how to close the generalizability gap of these models through pretraining on synthetic data and introduce Iceberg, a synthetic data augmentation approach that expands both large language model (LLM)-generated programs and weak labels of unseen design configurations. Our weak label generation method is integrated with an in-context model architecture, enabling meta-learning from actual and proximate labels. Iceberg improves the geometric mean modeling accuracy by $86.4%$ when adapt to six real-world applications with few-shot examples and achieves a $2.47 imes$ and a $1.12 imes$ better offline DSE performance when adapting to two different test datasets. Our open-sourced code is here: href{https://github.com/UCLA-VAST/iceberg}{https://github.com/UCLA-VAST/iceberg}

Problem

Research questions and friction points this paper is trying to address.

Improving HLS model generalizability with synthetic data

Generating weak labels for unseen design configurations

Enhancing accuracy via synthetic data augmentation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses synthetic data for HLS model pretraining

Integrates weak label generation with meta-learning

Enhances accuracy via few-shot adaptation

🔎 Similar Papers

No similar papers found.