🤖 AI Summary
This work addresses the challenge of efficiently optimizing deep neural network inference on compute-in-memory (CIM) crossbar arrays, where the high-dimensional, non-convex design space—shaped by model complexity and heterogeneous layer workloads—hinders effective exploration. To tackle this, we introduce, for the first time, a multi-objective Bayesian optimization framework for system-level co-design of CIM architectures, jointly tuning hardware configurations and per-layer network parameters. Our approach efficiently navigates a design space of up to 50 dimensions and ~10²⁷ possible configurations to identify Pareto-optimal trade-offs among accuracy, energy efficiency, and area. Integrating high-dimensional modeling, layer-granular resource allocation, and a CIM-aware simulator, the method achieves 91.72% accuracy on VGG8/CIFAR-10 and 57.2% on VGG16/Tiny-ImageNet-200, while reducing chip area by up to 65.52%, dynamic energy by 52.07%, and read latency by 13.27%, alongside significantly improved memory utilization.
📝 Abstract
Leveraging the high density and energy efficiency of Compute-In-Memory (CIM) crossbar-based Deep Neural Network (DNN) accelerators requires optimal Design Space Exploration (DSE), which becomes increasingly challenging as complex models for advanced AI workloads expand the highly non-convex design space. Moreover, heterogeneous layer workloads (e.g., memory- vs. compute-bound) and learning representations make layer-wise NN parameter allocation beneficial for efficiency but severely exacerbate the design space complexity by expanding the number of parameters to be tuned for simultaneous multi-objective optimization. Among existing DSE approaches, multi-objective Bayesian Optimization (BO) is promising, as it explores high-quality design solutions while querying costly CIM simulators selectively. In this work, we propose a multi-objective BO framework that holistically co-optimizes hardware and algorithm parameters of a CIM crossbar-based hardware accelerator for various DNN inference tasks. Depending on NN model depth, our framework handles high-dimensional design spaces (with $26$ and $50$ dimensions) and extremely large search complexities on the order of $O(10^{12})$ and $O(10^{27})$ for VGG8/CIFAR-10 and VGG16/Tiny-ImageNet-200. Our method attains $91.72 \%$ and $57.2 \%$ accuracy, respectively, comparable to baseline designs, while improving chip area ($65.52 \%$ and $50.7 \%$), read latency ($9.52 \%$ and $13.27 \%$), read dynamic energy ($31.23 \%$ and $52.07 \%$) and increasing memory utilization ($13.41 \%$ and $2.67 \%$).