🤖 AI Summary
This work addresses the limited generalization of existing in-memory computing accelerators, which are typically optimized for a single neural network. To overcome this limitation, we propose the first hardware-software co-optimization framework tailored for multi-workload scenarios. By employing an enhanced evolutionary algorithm, our approach jointly searches the RRAM/SRAM-based in-memory computing architecture and multi-model deployment strategies while explicitly modeling cross-task trade-offs. The resulting accelerator design achieves strong generality without sacrificing efficiency. Evaluated across four and nine concurrent workloads, our solution reduces the energy-delay-area product (EDAP) by 76.2% and 95.5% compared to baseline designs, respectively. These results significantly narrow the performance gap between general-purpose and application-specific accelerators, demonstrating exceptional robustness and adaptability across diverse workloads.
📝 Abstract
Software-hardware co-design is essential for optimizing in-memory computing (IMC) hardware accelerators for neural networks. However, most existing optimization frameworks target a single workload, leading to highly specialized hardware designs that do not generalize well across models and applications. In contrast, practical deployment scenarios require a single IMC platform that can efficiently support multiple neural network workloads. This work presents a joint hardware-workload co-optimization framework based on an optimized evolutionary algorithm for designing generalized IMC accelerator architectures. By explicitly capturing cross-workload trade-offs rather than optimizing for a single model, the proposed approach significantly reduces the performance gap between workload-specific and generalized IMC designs. The framework is evaluated on both RRAM- and SRAM-based IMC architectures, demonstrating strong robustness and adaptability across diverse design scenarios. Compared to baseline methods, the optimized designs achieve energy-delay-area product (EDAP) reductions of up to 76.2% and 95.5% when optimizing across a small set (4 workloads) and a large set (9 workloads), respectively. The source code of the framework is available at https://github.com/OlgaKrestinskaya/JointHardwareWorkloadOptimizationIMC.