Column-wise Quantization of Weights and Partial Sums for Accurate and Efficient Compute-In-Memory Accelerators

📅 2025-02-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses accuracy degradation in compute-in-memory (CIM) accelerators caused by partial-sum quantization errors induced by low-precision analog-to-digital converters (ADCs) and by constraints on low-bit weight representation. We propose a column-level aligned co-quantization method for weights and partial-sums. Our approach unifies the quantization granularity of weights and partial-sums at the column level and introduces column-wise independent scaling factors to enhance process variation robustness. Departing from conventional two-stage training, we adopt end-to-end hardware-aware training to eliminate dequantization overhead. Integrated with CIM-oriented convolution, fine-grained tiling, and grouped convolution optimizations, our method achieves accuracy improvements of 0.99%, 2.69%, and 1.01% on ResNet-20 (CIFAR-10/100) and ResNet-18 (ImageNet), respectively, while significantly improving robustness against memory cell variations.

Technology Category

Application Category

📝 Abstract
Compute-in-memory (CIM) is an efficient method for implementing deep neural networks (DNNs) but suffers from substantial overhead from analog-to-digital converters (ADCs), especially as ADC precision increases. Low-precision ADCs can re- duce this overhead but introduce partial-sum quantization errors degrading accuracy. Additionally, low-bit weight constraints, im- posed by cell limitations and the need for multiple cells for higher- bit weights, present further challenges. While fine-grained partial- sum quantization has been studied to lower ADC resolution effectively, weight granularity, which limits overall partial-sum quantized accuracy, remains underexplored. This work addresses these challenges by aligning weight and partial-sum quantization granularities at the column-wise level. Our method improves accuracy while maintaining dequantization overhead, simplifies training by removing two-stage processes, and ensures robustness to memory cell variations via independent column-wise scale factors. We also propose an open-source CIM-oriented convolution framework to handle fine-grained weights and partial-sums effi- ciently, incorporating a novel tiling method and group convolution. Experimental results on ResNet-20 (CIFAR-10, CIFAR-100) and ResNet-18 (ImageNet) show accuracy improvements of 0.99%, 2.69%, and 1.01%, respectively, compared to the best-performing related works. Additionally, variation analysis reveals the robust- ness of our method against memory cell variations. These findings highlight the effectiveness of our quantization scheme in enhancing accuracy and robustness while maintaining hardware efficiency in CIM-based DNN implementations. Our code is available at https://github.com/jiyoonkm/ColumnQuant.
Problem

Research questions and friction points this paper is trying to address.

Aligns weight and partial-sum quantization granularities
Improves accuracy in compute-in-memory accelerators
Ensures robustness to memory cell variations
Innovation

Methods, ideas, or system contributions that make the work stand out.

Column-wise weight quantization
Fine-grained partial-sum quantization
Open-source CIM convolution framework
🔎 Similar Papers
No similar papers found.
J
Jiyoon Kim
Department of Artificial Intelligence, Sungkyunkwan University
K
Kang Eun Jeon
Department of Electrical and Computer Engineering, Sungkyunkwan University
Yulhwa Kim
Yulhwa Kim
Sungkyunkwan University
Neural NetworkDeep LearningMachine LearningHardware AcceleratorNext Generation Memory
Jong Hwan Ko
Jong Hwan Ko
SungKyunKwan Univ. (SKKU)
Deep learning acceleratorImage/audio processingVLSI/IoT systems design