🤖 AI Summary
This work addresses the challenge that existing neural architecture search (NAS) methods struggle to accurately model the multidimensional hardware constraints of FPGAs, leading to inaccurate deployment cost predictions. To overcome this, the authors propose an open-source AutoML framework that jointly optimizes neural architectures and FPGA resource utilization. The framework employs a hardware proxy model to replace time-consuming synthesis, integrates quantization-aware training with iterative pruning, and leverages Optuna and NSGA-II for multi-objective global search. Distributed optimization is enabled via SQLite, and end-to-end deployment is achieved using hls4ml. Evaluated on LHC jet classification and superconducting qubit readout tasks, the automatically generated compact architectures match or exceed baseline performance while substantially reducing resource consumption, shortening the design cycle from months to hours.
📝 Abstract
Neural architecture search (NAS) is a powerful approach for automating model design, but existing methods often optimize for accuracy alone or rely on proxy metrics such as bit operations (BOPs) that correlate poorly with hardware cost. This gap is particularly large for FPGA deployment, where cost is dominated by a multi-dimensional budget of lookup tables, DSPs, flip-flops, BRAM, and latency. We present the Surrogate Neural Architecture Codesign Package (SNAC-Pack), an open-source AutoML framework for hardware-aware neural architecture codesign and end-to-end FPGA deployment. SNAC-Pack runs a multi-objective global search with Optuna and NSGA-II, loading trials to a shared SQLite store that enables parallel workers across compute nodes. A hardware surrogate model outputs per-trial resource and latency estimates, avoiding the synthesis cost that would otherwise dominate the search loop. A local search stage then applies quantization-aware training (QAT) together with iterative magnitude pruning in a combined compression loop, after which the final model is synthesized to FPGA firmware via the hls4ml Python library. A YAML configuration and an optional agentic frontend let users run the pipeline on new datasets without modifying the framework. We demonstrate SNAC-Pack on jet classification at the Large Hadron Collider and superconducting qubit readout, discovering compact architectures that match or exceed strong baselines on the task metric while reducing FPGA resource utilization and, in the qubit readout case, reducing the design space exploration process from months of manual fine-tuning to hours of automated search.