Column-wise Quantization of Weights and Partial Sums for Accurate and Efficient Compute-In-Memory Accelerators

📅 2025-02-11

📈 Citations: 0

✨ Influential: 0

career value

205K/year

🤖 AI Summary

This work addresses accuracy degradation in compute-in-memory (CIM) accelerators caused by partial-sum quantization errors induced by low-precision analog-to-digital converters (ADCs) and by constraints on low-bit weight representation. We propose a column-level aligned co-quantization method for weights and partial-sums. Our approach unifies the quantization granularity of weights and partial-sums at the column level and introduces column-wise independent scaling factors to enhance process variation robustness. Departing from conventional two-stage training, we adopt end-to-end hardware-aware training to eliminate dequantization overhead. Integrated with CIM-oriented convolution, fine-grained tiling, and grouped convolution optimizations, our method achieves accuracy improvements of 0.99%, 2.69%, and 1.01% on ResNet-20 (CIFAR-10/100) and ResNet-18 (ImageNet), respectively, while significantly improving robustness against memory cell variations.

Technology Category

Application Category

📝 Abstract

Compute-in-memory (CIM) is an efficient method for implementing deep neural networks (DNNs) but suffers from substantial overhead from analog-to-digital converters (ADCs), especially as ADC precision increases. Low-precision ADCs can re- duce this overhead but introduce partial-sum quantization errors degrading accuracy. Additionally, low-bit weight constraints, im- posed by cell limitations and the need for multiple cells for higher- bit weights, present further challenges. While fine-grained partial- sum quantization has been studied to lower ADC resolution effectively, weight granularity, which limits overall partial-sum quantized accuracy, remains underexplored. This work addresses these challenges by aligning weight and partial-sum quantization granularities at the column-wise level. Our method improves accuracy while maintaining dequantization overhead, simplifies training by removing two-stage processes, and ensures robustness to memory cell variations via independent column-wise scale factors. We also propose an open-source CIM-oriented convolution framework to handle fine-grained weights and partial-sums effi- ciently, incorporating a novel tiling method and group convolution. Experimental results on ResNet-20 (CIFAR-10, CIFAR-100) and ResNet-18 (ImageNet) show accuracy improvements of 0.99%, 2.69%, and 1.01%, respectively, compared to the best-performing related works. Additionally, variation analysis reveals the robust- ness of our method against memory cell variations. These findings highlight the effectiveness of our quantization scheme in enhancing accuracy and robustness while maintaining hardware efficiency in CIM-based DNN implementations. Our code is available at https://github.com/jiyoonkm/ColumnQuant.

Problem

Research questions and friction points this paper is trying to address.

Aligns weight and partial-sum quantization granularities

Improves accuracy in compute-in-memory accelerators

Ensures robustness to memory cell variations

Innovation

Methods, ideas, or system contributions that make the work stand out.

Column-wise weight quantization

Fine-grained partial-sum quantization

Open-source CIM convolution framework

🔎 Similar Papers

SageAttention: Accurate 8-Bit Attention for Plug-and-play Inference Acceleration