🤖 AI Summary
Graph computation suffers from performance bottlenecks on conventional parallel architectures due to the irregular memory access patterns and severe load imbalance inherent in real-world graphs. To address these challenges, this paper proposes UpDown, a fine-grained programmable architecture co-designed across hardware and software to optimize graph traversal and iterative computation. UpDown supports efficient execution of multiple variants of key graph algorithms—including PageRank and BFS—while preserving high programmability. Evaluated on RMAT-generated graphs with 33 million processing elements, UpDown achieves 637K GTEPS for PageRank and 989K GTEPS for BFS, representing 5× and 100× speedups over prior state-of-the-art, respectively. By fundamentally rethinking architectural support for irregular workloads, UpDown establishes a new paradigm for scalable, high-performance graph processing.
📝 Abstract
Large-scale graph problems are of critical and growing importance and historically parallel architectures have provided little support. In the spirit of co-design, we explore the question, How fast can graph computing go on a fine-grained architecture? We explore the possibilities of an architecture optimized for fine-grained parallelism, natural programming, and the irregularity and skew found in real-world graphs. Using two graph benchmarks, PageRank (PR) and Breadth-First Search (BFS), we evaluate a Fine-Grained Graph architecture, UpDown, to explore what performance codesign can achieve. To demonstrate programmability, we wrote five variants of these algorithms. Simulations of up to 256 nodes (524,288 lanes) and projections to 16,384 nodes (33M lanes) show the UpDown system can achieve 637K GTEPS PR and 989K GTEPS BFS on RMAT, exceeding the best prior results by 5x and 100x respectively.