🤖 AI Summary
Deep learning models exhibit poor generalization in high-level synthesis (HLS) hardware design prediction and struggle to transfer across diverse design configurations. To address this, we propose Iceberg—a novel pretraining paradigm integrating large language model (LLM)-based program generation, weakly supervised label expansion, and in-context meta-learning. Iceberg constructs training signals grounded in synthetic data that jointly satisfy realism and proximity constraints, eliminating the need for manual annotation. This enables significantly improved adaptability to unseen designs. Experiments demonstrate that Iceberg achieves an 86.4% gain in geometric mean accuracy under few-shot transfer across six real-world HLS tasks. Furthermore, in offline design-space exploration on two realistic benchmarks, it improves search efficiency by 2.47× and 1.12×, respectively. Iceberg establishes a scalable, low-dependency paradigm for intelligent HLS optimization—requiring no human-labeled data while enabling robust cross-configuration generalization.
📝 Abstract
Deep learning-based prediction models for High-Level Synthesis (HLS) of hardware designs often struggle to generalize. In this paper, we study how to close the generalizability gap of these models through pretraining on synthetic data and introduce Iceberg, a synthetic data augmentation approach that expands both large language model (LLM)-generated programs and weak labels of unseen design configurations. Our weak label generation method is integrated with an in-context model architecture, enabling meta-learning from actual and proximate labels. Iceberg improves the geometric mean modeling accuracy by $86.4%$ when adapt to six real-world applications with few-shot examples and achieves a $2.47 imes$ and a $1.12 imes$ better offline DSE performance when adapting to two different test datasets. Our open-sourced code is here: href{https://github.com/UCLA-VAST/iceberg}{https://github.com/UCLA-VAST/iceberg}