đ¤ AI Summary
Non-binary LDPC decoding faces dual bottlenecksâcomputation intensity and memory bandwidth limitationsâwhere excessive data movement under conventional architectures severely constrains throughput. To address this, we propose a near-memory decoding architecture leveraging Processing-in-Memory (PiM), and present the first hardware implementation of parallel bit-level non-binary LDPC decoding on the UPMEM PIM platform. Our design exploits thousands of lightweight processing elements (PEs) to execute efficient bitwise operations and finite-field arithmetic directly within memory, drastically reducing data movement overhead. Experimental evaluation achieves a throughput of 76 Mbit/sâcomparable to highly optimized edge GPU implementationsâwhile delivering substantial energy-efficiency gains. This work establishes a scalable PiM-based paradigm for high-throughput, low-latency non-binary LDPC decoding.
đ Abstract
Low-density parity-check (LDPC) codes are an important feature of several communication and storage applications, offering a flexible and effective method for error correction. These codes are computationally complex and require the exploitation of parallel processing to meet real-time constraints. As advancements in arithmetic and logic unit technology allowed for higher performance of computing systems, memory technology has not kept the same pace of development, creating a data movement bottleneck and affecting parallel processing systems more dramatically. To alleviate the severity of this bottleneck, several solutions have been proposed, namely the processing in-memory (PiM) paradigm that involves the design of compute units to where (or near) the data is stored, utilizing thousands of low-complexity processing units to perform out bit-wise and simple arithmetic operations. This paper presents a novel efficient solution for near-memory non-binary LDPC decoders in the UPMEM system, for the best of our knowledge the first real hardware PiM-based non-binary LDPC decoder that is benchmarked against low-power GPU parallel solutions highly optimized for throughput performance. PiM-based non-binary LDPC decoders can achieve 76 Mbit/s of decoding throughput, which is even competitive when compared against implementations running in edge GPUs.