🤖 AI Summary
Joint optimization of the hardware design space and algorithmic mapping space suffers from combinatorial explosion. This paper proposes DOSA, the first framework to formulate co-search of hardware and mapping as a differentiable optimization problem. DOSA constructs a differentiable performance model by synergistically integrating analytical modeling with learned components, enabling gradient-based continuous optimization. The framework is modular and supports end-to-end joint optimization of buffer configurations, dataflow mappings, and hardware parameters. Experiments demonstrate that, under identical sampling budgets, DOSA reduces energy-delay product by 2.80× and 12.59× over random search and Bayesian optimization, respectively; in real accelerator synthesis, it achieves a 1.82× improvement in energy-delay efficiency. The core contribution is establishing a hardware-mapping co-differentiable modeling paradigm, overcoming the limitations of conventional staged optimization approaches.
📝 Abstract
In the hardware design space exploration process, it is critical to optimize both hardware parameters and algorithm-to-hardware mappings. Previous work has largely approached this simultaneous optimization problem by separately exploring the hardware design space and the mapspace - both individually large and highly nonconvex spaces - independently. The resulting combinatorial explosion has created significant difficulties for optimizers.
In this paper, we introduce DOSA, which consists of differentiable performance models and a gradient descent-based optimization technique to simultaneously explore both spaces and identify high-performing design points. Experimental results demonstrate that DOSA outperforms random search and Bayesian optimization by 2.80x and 12.59x, respectively, in improving DNN model energy-delay product, given a similar number of samples. We also demonstrate the modularity and flexibility of DOSA by augmenting our analytical model with a learned model, allowing us to optimize buffer sizes and mappings of a real DNN accelerator and attain a 1.82x improvement in energy-delay product.