🤖 AI Summary
To address the trade-off between energy efficiency and accuracy in conventional quantization methods for ReRAM-based compute-in-memory (CIM) architectures, this paper proposes a sensitivity-guided structured mixed-precision quantization paradigm. The method performs weight sensitivity analysis to dynamically allocate bit-widths across layers and channels, while jointly optimizing ReRAM crossbar array mapping to maximize hardware utilization. Compared to fixed-precision baselines, our approach achieves 86.33% accuracy under 70% model compression, reduces power consumption by 40%, and concurrently lowers latency and computational load. The key innovation lies in the first unified modeling of sensitivity-driven quantization, structured mixed-precision assignment, and crossbar mapping—enabling co-optimization of compression ratio, energy efficiency, and accuracy in ReRAM CIM systems.
📝 Abstract
Compute-In-Memory (CIM) systems, particularly those utilizing ReRAM and memristive technologies, offer a promising path toward energy-efficient neural network computation. However, conventional quantization and compression techniques often fail to fully optimize performance and efficiency in these architectures. In this work, we present a structured quantization method that combines sensitivity analysis with mixed-precision strategies to enhance weight storage and computational performance on ReRAM-based CIM systems. Our approach improves ReRAM Crossbar utilization, significantly reducing power consumption, latency, and computational load, while maintaining high accuracy. Experimental results show 86.33% accuracy at 70% compression, alongside a 40% reduction in power consumption, demonstrating the method's effectiveness for power-constrained applications.