🤖 AI Summary
To address irregular memory access bottlenecks arising from graph sparsity in GNN inference, this work proposes an event-driven FPGA acceleration architecture. It abandons conventional double-buffering, introducing instead a novel event-driven programming model jointly designed with node-level mixed-precision quantization to enable fine-grained, dynamic adaptation of computation and memory access. Custom data and instruction prefetchers are integrated to mitigate irregular memory access overhead. Evaluated on citation and social network datasets ranging from 2K to 700K nodes, the architecture achieves average speedups of 243× over CPU and 7.2× over GPU. This is the first work to systematically incorporate the event-driven paradigm into GNN hardware acceleration, significantly improving adaptability to irregular graph structures and energy efficiency.
📝 Abstract
Graph Neural Networks (GNNs) have recently gained attention due to their performance on non-Euclidean data. The use of custom hardware architectures proves particularly beneficial for GNNs due to their irregular memory access patterns, resulting from the sparse structure of graphs. However, existing FPGA accelerators are limited by their double buffering mechanism, which doesn't account for the irregular node distribution in typical graph datasets. To address this, we introduce extbf{AMPLE} (Accelerated Message Passing Logic Engine), an FPGA accelerator leveraging a new event-driven programming flow. We develop a mixed-arithmetic architecture, enabling GNN inference to be quantized at a node-level granularity. Finally, prefetcher for data and instructions is implemented to optimize off-chip memory access and maximize node parallelism. Evaluation on citation and social media graph datasets ranging from $2$K to $700$K nodes showed a mean speedup of $243 imes$ and $7.2 imes$ against CPU and GPU counterparts, respectively.