🤖 AI Summary
Hardware-aware neural architecture search (HW-NAS) faces dual challenges: joint optimization of accuracy and latency, and low search efficiency. Conventional supernet-based methods incur prohibitive computational overhead, while existing LLM-driven approaches suffer from exploration bias, limiting coverage of diverse architectures across latency regimes. This paper proposes an efficient LLM-driven HW-NAS framework. Its core innovations are: (1) a complexity-aware hierarchical partitioning of the search space; and (2) a co-evolutionary mechanism for architecture generation and prompt engineering, integrating zero-cost predictors and knowledge-base-guided prompt refinement to mitigate LLMs’ intrinsic architectural biases. Evaluated on HW-NAS-Bench, our method achieves higher hypervolume (HV), lower inverted generational distance (IGD), up to 54% latency reduction, and compresses search time from days to minutes—significantly improving both efficiency and diversity in discovering cost-effective architectures across latency ranges.
📝 Abstract
Hardware-Aware Neural Architecture Search (HW-NAS) requires joint optimization of accuracy and latency under device constraints. Traditional supernet-based methods require multiple GPU days per dataset. Large Language Model (LLM)-driven approaches avoid training a large supernet and can provide quick feedback, but we observe an exploration bias: the LLM repeatedly proposes neural network designs within limited search space and fails to discover architectures across different latency ranges in the entire search space. To address this issue, we propose PEL-NAS: a search space Partitioned, architecture prompt co-Evolutionary and LLM-driven Neural Architecture Search that can generate neural networks with high accuracy and low latency with reduced search cost. Our proposed PEL-NAS has three key components: 1) a complexity-driven partitioning engine that divides the search space by complexity to enforce diversity and mitigate exploration bias; 2) an LLM-powered architecture prompt co-evolution operator, in which the LLM first updates a knowledge base of design heuristics based on results from the previous round, then performs a guided evolution algorithm on architectures with prompts that incorporate this knowledge base. Prompts and designs improve together across rounds which avoids random guesswork and improve efficiency; 3) a zero-cost predictor to avoid training a large number of candidates from scratch. Experimental results show that on HW-NAS-Bench, PEL-NAS can achieve overall higher HV, lower IGD, and up to 54% lower latency than baselines at similar accuracy. Meanwhile, the search cost drops from days to minutes compared with traditional supernet baselines.