GEN-Graph: Heterogeneous PIM Accelerator for General Computational Patterns in Graph-based Dynamic Programming

๐Ÿ“… 2026-04-12
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF

career value

211K/year
๐Ÿค– AI Summary
This work addresses a fundamental computational incompatibility between matrix operations and graph traversal in graph dynamic programming, which hinders efficient execution on homogeneous in-memory architectures. To overcome this challenge, the authors propose GEN-Graph, a heterogeneous in-memory computing chip that, for the first time, enables scalable and exact solutions for general-purpose graph dynamic programming. The chip leverages 2.5D packaging to integrate processing-using-memory (PUM) units optimized for matrix computations and processing-near-memory (PNM) units tailored for graph traversal, guided by an algorithm-structure-aware hardware-software co-design. Experimental results demonstrate that the matrix unit achieves a 42.8ร— speedup and 392ร— higher energy efficiency over an H100 GPU on the all-pairs shortest paths (APSP) task, while the traversal unit delivers throughput of 2.56 million and 39,300 reads per second for short and long reads, respectivelyโ€”up to 2.56ร— higher than existing accelerators.

Technology Category

Application Category

๐Ÿ“ Abstract
While graph-based dynamic programming (DP) is a cornerstone of genomics and network analytics, its efficiency is hampered by fundamentally conflicting computational patterns. Matrix-centric DP drives regular, compute-bound network analytics, while topology-centric DP handles irregular, memory-bound genomic traversals. These two categories of DP have substantially different computation patterns and dataflows, which makes it difficult for a single homogeneous processing-in-memory (PIM) architecture to efficiently support both. This work presents GEN-Graph, a novel heterogeneous PIM chiplet that integrates two types of specialized compute tiles within a 2.5D package: Matrix-tile, a processing-using-memory (PUM) tile optimized for matrix-centric workloads, such as all-pairs shortest path (APSP); and traversal-tile, a processing-near-memory (PNM) tile optimized for traversal-centric DP workloads, such as DNA sequence alignment. Our hardware-software co-design employs recursive partitioning and reconfigurable windowed bit-parallel logic to ensure exact computation. Results show the matrix tile achieves 42.8x speedup and 392x energy efficiency over the NVIDIA H100 GPU for APSP. For sequence-to-graph alignment, the traversal tile sustains 2.56 million reads/s (short-reads) and 39.3 thousand reads/s (long-reads), outperforming state-of-the-art accelerators by up to 2.56x in throughput. GEN-Graph provides the first scalable, exact solution for general DP dataflows by matching hardware specialization to algorithmic structure.
Problem

Research questions and friction points this paper is trying to address.

graph-based dynamic programming
computational patterns
processing-in-memory
heterogeneous architecture
genomics
Innovation

Methods, ideas, or system contributions that make the work stand out.

heterogeneous PIM
graph-based dynamic programming
processing-in-memory
hardware-software co-design
computational specialization
๐Ÿ”Ž Similar Papers
2024-02-26Proceedings of the ACM on Measurement and Analysis of Computing SystemsCitations: 4