Auto-NBA: Efficient and Effective Search Over the Joint Space of Networks, Bitwidths, and Accelerators

📅 2021-06-11
🏛️ International Conference on Machine Learning
📈 Citations: 17
Influential: 7
📄 PDF
🤖 AI Summary
In DNN accelerator design, tight coupling among network architecture, quantization bit-width, and hardware architecture leads to memory explosion, difficulty in modeling discrete search spaces, and a “chicken-and-egg” co-design dependency between software and hardware. To address this, this paper proposes the first joint search framework for all three dimensions. Methodologically: (1) we introduce a heterogeneous sampling strategy enabling unbiased search with constant memory overhead; (2) we develop a differentiable, general-purpose accelerator search engine supporting end-to-end joint optimization; and (3) we unify hardware-aware neural architecture search (NAS) with quantization-aware training (QAT). Evaluated on multiple benchmark datasets, our method significantly outperforms state-of-the-art approaches—reducing search time by several-fold, achieving higher task accuracy, and delivering substantial improvements in accelerator energy efficiency. The source code is publicly available.
📝 Abstract
While maximizing deep neural networks' (DNNs') acceleration efficiency requires a joint search/design of three different yet highly coupled aspects, including the networks, bitwidths, and accelerators, the challenges associated with such a joint search have not yet been fully understood and addressed. The key challenges include (1) the dilemma of whether to explode the memory consumption due to the huge joint space or achieve sub-optimal designs, (2) the discrete nature of the accelerator design space that is coupled yet different from that of the networks and bitwidths, and (3) the chicken and egg problem associated with network-accelerator co-search, i.e., co-search requires operation-wise hardware cost, which is lacking during search as the optimal accelerator depending on the whole network is still unknown during search. To tackle these daunting challenges towards optimal and fast development of DNN accelerators, we propose a framework dubbed Auto-NBA to enable jointly searching for the Networks, Bitwidths, and Accelerators, by efficiently localizing the optimal design within the huge joint design space for each target dataset and acceleration specification. Our Auto-NBA integrates a heterogeneous sampling strategy to achieve unbiased search with constant memory consumption, and a novel joint-search pipeline equipped with a generic differentiable accelerator search engine. Extensive experiments and ablation studies validate that both Auto-NBA generated networks and accelerators consistently outperform state-of-the-art designs (including co-search/exploration techniques, hardware-aware NAS methods, and DNN accelerators), in terms of search time, task accuracy, and accelerator efficiency. Our codes are available at: https://github.com/RICE-EIC/Auto-NBA.
Problem

Research questions and friction points this paper is trying to address.

DNN Optimization
Memory Constraints
Accelerator Design
Innovation

Methods, ideas, or system contributions that make the work stand out.

Auto-NBA
Optimized DNN Design
Accelerator Efficiency