π€ AI Summary
Neural architecture search (NAS) remains hindered by prohibitive computational costs. To address this, we propose Warm-Start Supernetwork Transfer NASβa novel framework that synergistically integrates optimal transport theory with multi-dataset joint pretraining to enable efficient cross-task supernetwork parameter reuse. Our method supports zero-shot forward transfer, substantially enhancing the robustness and generalization of differentiable NAS approaches (e.g., DARTS). Evaluated across dozens of image classification benchmarks, it accelerates supernetwork training by 3β5Γ; searched architectures consistently outperform from-scratch baselines and achieve positive forward transfer on nearly all target datasets. Key contributions are: (1) the first application of optimal transport to supernetwork transfer in NAS; (2) a unified framework co-optimizing multi-dataset pretraining and parameter transfer; and (3) a practical NAS paradigm balancing high efficiency with strong generalization.
π Abstract
Hand-designing Neural Networks is a tedious process that requires significant expertise. Neural Architecture Search (NAS) frameworks offer a very useful and popular solution that helps to democratize AI. However, these NAS frameworks are often computationally expensive to run, which limits their applicability and accessibility. In this paper, we propose a novel transfer learning approach, capable of effectively transferring pretrained supernets based on Optimal Transport or multi-dataset pretaining. This method can be generally applied to NAS methods based on Differentiable Architecture Search (DARTS). Through extensive experiments across dozens of image classification tasks, we demonstrate that transferring pretrained supernets in this way can not only drastically speed up the supernet training which then finds optimal models (3 to 5 times faster on average), but even yield that outperform those found when running DARTS methods from scratch. We also observe positive transfer to almost all target datasets, making it very robust. Besides drastically improving the applicability of NAS methods, this also opens up new applications for continual learning and related fields.