CIMNAS: A Joint Framework for Compute-In-Memory-Aware Neural Architecture Search

📅 2025-09-30

📈 Citations: 0

✨ Influential: 0

career value

254K/year

🤖 AI Summary

To address the challenge of co-optimizing software and hardware for Compute-in-Memory (CIM) accelerators, this paper proposes CIMNAS—a first-of-its-kind end-to-end joint hardware-software search framework tailored for CIM architectures. CIMNAS integrates hardware-aware neural architecture search (HW-NAS), quantization-aware training, and circuit-level RRAM/SRAM modeling to jointly optimize neural network architectures, quantization policies, and hardware parameters—including array size and precision configuration—at scale, balancing energy efficiency, area, and accuracy. On MobileNet, CIMNAS reduces the Energy-Delay-Area Product (EDAP) by 90.1×–104.5×, improves throughput per watt (TOPS/W) by 4.68×–4.82×, and increases throughput per mm² (TOPS/mm²) by 11.3×–12.78×, while maintaining 73.81% top-1 accuracy. For ResNet-50, EDAP reduction reaches up to 819.5×. The framework supports cross-platform deployment without accuracy degradation.

Technology Category

Application Category

📝 Abstract

To maximize hardware efficiency and performance accuracy in Compute-In-Memory (CIM)-based neural network accelerators for Artificial Intelligence (AI) applications, co-optimizing both software and hardware design parameters is essential. Manual tuning is impractical due to the vast number of parameters and their complex interdependencies. To effectively automate the design and optimization of CIM-based neural network accelerators, hardware-aware neural architecture search (HW-NAS) techniques can be applied. This work introduces CIMNAS, a joint model-quantization-hardware optimization framework for CIM architectures. CIMNAS simultaneously searches across software parameters, quantization policies, and a broad range of hardware parameters, incorporating device-, circuit-, and architecture-level co-optimizations. CIMNAS experiments were conducted over a search space of 9.9x10^85 potential parameter combinations with the MobileNet model as a baseline and RRAM-based CIM architecture. Evaluated on the ImageNet dataset, CIMNAS achieved a reduction in energy-delay-area product (EDAP) ranging from 90.1x to 104.5x, an improvement in TOPS/W between 4.68x and 4.82x, and an enhancement in TOPS/mm^2 from 11.3x to 12.78x relative to various baselines, all while maintaining an accuracy of 73.81%. The adaptability and robustness of CIMNAS are demonstrated by extending the framework to support the SRAM-based ResNet50 architecture, achieving up to an 819.5x reduction in EDAP. Unlike other state-of-the-art methods, CIMNAS achieves EDAP-focused optimization without any accuracy loss, generating diverse software-hardware parameter combinations for high-performance CIM-based neural network designs. The source code of CIMNAS is available at https://github.com/OlgaKrestinskaya/CIMNAS.

Problem

Research questions and friction points this paper is trying to address.

Automating neural network accelerator design for CIM architectures

Co-optimizing software, quantization, and hardware parameters simultaneously

Achieving hardware efficiency without sacrificing model accuracy

Innovation

Methods, ideas, or system contributions that make the work stand out.

Jointly optimizes software, quantization, and hardware parameters

Searches across device, circuit, and architecture levels

Automates design for compute-in-memory neural accelerators

🔎 Similar Papers

Graph is all you need? Lightweight data-agnostic neural architecture search without training