GCoD: Graph Convolutional Network Acceleration via Dedicated Algorithm and Accelerator Co-Design

📅 2021-12-22
🏛️ International Symposium on High-Performance Computer Architecture
📈 Citations: 44
Influential: 2
📄 PDF
🤖 AI Summary
To address the low inference efficiency of Graph Convolutional Networks (GCNs) on large-scale sparse graphs—caused by irregular memory access patterns and poor data locality stemming from power-law degree distributions—this paper proposes GCoD, an algorithm-hardware co-design framework. Methodologically, GCoD introduces (1) a novel graph-local polarization partitioning algorithm that hierarchically decomposes the adjacency matrix into high- and low-density substructures, and (2) a density-aware dual-mode accelerator featuring separate execution paths for dense and sparse subgraphs, integrated with on-chip dataflow optimization and memory-access compression. Evaluated on real-world graph datasets, GCoD achieves up to 15,286× speedup over CPU, GPU, HyGCN, and AWB-GCN baselines, significantly reduces off-chip memory traffic, and maintains or even improves model accuracy.
📝 Abstract
Graph Convolutional Networks (GCNs) have emerged as the state-of-the-art graph learning model. However, it can be notoriously challenging to inference GCNs over large graph datasets, limiting their application to large real-world graphs and hindering the exploration of deeper and more sophisticated GCN graphs. This is because real-world graphs can be extremely large and sparse. Furthermore, the node degree of GCNs tends to follow the power-law distribution and therefore have highly irregular adjacency matrices, resulting in prohibitive inefficiencies in both data processing and movement and thus substantially limiting the achievable GCN acceleration efficiency. To this end, this paper proposes a GCN algorithm and accelerator Co-Design framework dubbed GCoD which can largely alleviate the aforementioned GCN irregularity and boost GCNs’ inference efficiency. Specifically, on the algorithm level, GCoD integrates a split and conquer GCN training strategy that polarizes the graphs to be either denser or sparser in local neighborhoods without compromising the model accuracy, resulting in graph adjacency matrices that (mostly) have merely two levels of workload and enjoys largely enhanced regularity and thus ease of acceleration. On the hardware level, we further develop a dedicated two-pronged accelerator with a separated engine to process each of the aforementioned denser and sparser workloads, further boosting the overall utilization and acceleration efficiency. Extensive experiments and ablation studies validate that our GCoD consistently reduces the number of off-chip accesses, leading to speedups 15286×, 294×, 7.8×, and 2.5× as compared to CPUs, GPUs, and prior-art GCN accelerators including HyGCN and AWB-GCN, respectively, while maintaining or even improving the task accuracy. Additionally, we visualize GCoD trained graph adjacency matrices for a better understanding of its advantages.
Problem

Research questions and friction points this paper is trying to address.

Accelerates GCN inference for large, sparse graphs
Addresses irregular adjacency matrices in GCNs
Co-designs algorithm and hardware for GCN efficiency
Innovation

Methods, ideas, or system contributions that make the work stand out.

Co-design of GCN algorithm and accelerator
Split and conquer GCN training strategy
Dedicated two-pronged accelerator for workloads
🔎 Similar Papers
No similar papers found.
Haoran You
Haoran You
Georgia Tech
Efficient MLAlg-HW Co-Design
T
Tong Geng
Pacific Northwest National Laboratory, Richland, WA
Yongan Zhang
Yongan Zhang
Georgia Institute of Technology
Deep LearningAI Accelerators
A
Ang Li
Pacific Northwest National Laboratory, Richland, WA
Y
Yingyan Lin
Rice University, Houston, TX