π€ AI Summary
To address computation errors caused by defective columns in Processing-Using-DRAM (PUD), this work proposes a high-precision column-level calibration method. The approach introduces, for the first time, a column-customized bias generation mechanism leveraging DDR4 DRAMβs multi-level charge states; it achieves wide-range, high-resolution fine-grained bias compensation under row-resource constraints via multi-level charge programming and column-wise bias modeling. Crucially, the scheme is fully compatible with standard DDR4 protocols and requires no hardware modifications. Experimental results demonstrate a 55.2% reduction in erroneous columns, a 1.81Γ increase in usable computing columns, and 1.88Γ and 1.89Γ throughput improvements for PUD addition and multiplication, respectively. This is the first work to incorporate multi-level charge control into column-level PUD calibration, significantly enhancing the co-optimization of reliability and computational efficiency.
π Abstract
Recently, practical analog in-memory computing has been realized using unmodified commercial DRAM modules. The underlying Processing-Using-DRAM (PUD) techniques enable high-throughput bitwise operations directly within DRAM arrays. However, the presence of inherent error-prone columns hinders PUD's practical adoption. While selectively using only error-free columns would ensure reliability, this approach significantly reduces PUD's computational throughput. This paper presents PUDTune, a novel high-precision calibration technique for increasing the number of error-free columns in PUD. PUDTune compensates for errors by applying pre-identified column-specific offsets to PUD operations. By leveraging multi-level charge states of DRAM cells, PUDTune generates fine-grained and wide-range offset variations despite the limited available rows. Our experiments with DDR4 DRAM demonstrate that PUDTune increases the number of error-free columns by 1.81$ imes$ compared to conventional implementations, improving addition and multiplication throughput by 1.88$ imes$ and 1.89$ imes$ respectively.