AMPLE: Event-Driven Accelerator for Mixed-Precision Inference of Graph Neural Networks

📅 2025-02-28

📈 Citations: 0

✨ Influential: 0

career value

204K/year

🤖 AI Summary

To address irregular memory access bottlenecks arising from graph sparsity in GNN inference, this work proposes an event-driven FPGA acceleration architecture. It abandons conventional double-buffering, introducing instead a novel event-driven programming model jointly designed with node-level mixed-precision quantization to enable fine-grained, dynamic adaptation of computation and memory access. Custom data and instruction prefetchers are integrated to mitigate irregular memory access overhead. Evaluated on citation and social network datasets ranging from 2K to 700K nodes, the architecture achieves average speedups of 243× over CPU and 7.2× over GPU. This is the first work to systematically incorporate the event-driven paradigm into GNN hardware acceleration, significantly improving adaptability to irregular graph structures and energy efficiency.

Technology Category

Application Category

📝 Abstract

Graph Neural Networks (GNNs) have recently gained attention due to their performance on non-Euclidean data. The use of custom hardware architectures proves particularly beneficial for GNNs due to their irregular memory access patterns, resulting from the sparse structure of graphs. However, existing FPGA accelerators are limited by their double buffering mechanism, which doesn't account for the irregular node distribution in typical graph datasets. To address this, we introduce extbf{AMPLE} (Accelerated Message Passing Logic Engine), an FPGA accelerator leveraging a new event-driven programming flow. We develop a mixed-arithmetic architecture, enabling GNN inference to be quantized at a node-level granularity. Finally, prefetcher for data and instructions is implemented to optimize off-chip memory access and maximize node parallelism. Evaluation on citation and social media graph datasets ranging from $2$K to $700$K nodes showed a mean speedup of $243 imes$ and $7.2 imes$ against CPU and GPU counterparts, respectively.

Problem

Research questions and friction points this paper is trying to address.

Address irregular memory access in GNNs using FPGA accelerators.

Enable node-level quantization for mixed-precision GNN inference.

Optimize off-chip memory access and node parallelism.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Event-driven programming flow for FPGA accelerators

Mixed-arithmetic architecture for node-level quantization

Prefetcher optimizes off-chip memory access and parallelism

🔎 Similar Papers

Survey on Characterizing and Understanding GNNs from a Computer Architecture Perspective