🤖 AI Summary
Neural architecture search (NAS) faces significant challenges in highly expressive search spaces—such as context-free grammar–based structural spaces—including prohibitive evaluation costs and poor cross-dataset generalization. To address this, we propose a transferable surrogate modeling framework that jointly leverages zero-cost proxies, neural graph features (GRAF), and fine-tuned large language models to enable accurate cross-dataset architectural performance prediction. Our work is the first to systematically demonstrate strong generalization of such surrogate models in cross-domain settings, supporting both architecture pre-screening and direct substitution of expensive training objectives. The method substantially reduces NAS computational overhead, discovers superior architectures on unseen datasets, and achieves high prediction accuracy with robust transferability—effectively balancing search efficiency and final model performance.
📝 Abstract
Neural architecture search (NAS) faces a challenge in balancing the exploration of expressive, broad search spaces that enable architectural innovation with the need for efficient evaluation of architectures to effectively search such spaces. We investigate surrogate model training for improving search in highly expressive NAS search spaces based on context-free grammars. We show that i) surrogate models trained either using zero-cost-proxy metrics and neural graph features (GRAF) or by fine-tuning an off-the-shelf LM have high predictive power for the performance of architectures both within and across datasets, ii) these surrogates can be used to filter out bad architectures when searching on novel datasets, thereby significantly speeding up search and achieving better final performances, and iii) the surrogates can be further used directly as the search objective for huge speed-ups.