Accelerating Dynamic Image Graph Construction on FPGA for Vision GNNs

📅 2025-09-29

📈 Citations: 0

✨ Influential: 0

career value

221K/year

🤖 AI Summary

Dynamic Image Graph Construction (DIGC) constitutes the primary latency bottleneck in Vision Graph Neural Networks (ViG), accounting for up to 95% of total inference time at high resolutions. Existing algorithmic optimizations typically compromise flexibility, accuracy, or generality. This paper introduces the first FPGA-based streaming deep-pipelined architecture tailored for DIGC: it employs on-chip tiling and localized computation to drastically reduce off-chip memory accesses; integrates streaming local merge-sort with heap-insertion-driven global k-way merging to jointly ensure accuracy, configurability, and scalability; and supports seamless adaptation across diverse ViG models and arbitrary input resolutions. Post-place-and-route, the design achieves high operating frequency. Under typical configurations, it delivers 16.6× and 6.8× speedup over optimized CPU and GPU implementations, respectively—establishing an efficient hardware paradigm for real-time ViG deployment.

Technology Category

Application Category

📝 Abstract

Vision Graph Neural Networks (Vision GNNs, or ViGs) represent images as unstructured graphs, achieving state of the art performance in computer vision tasks such as image classification, object detection, and instance segmentation. Dynamic Image Graph Construction (DIGC) builds image graphs by connecting patches (nodes) based on feature similarity, and is dynamically repeated in each ViG layer following GNN based patch (node) feature updates. However, DIGC constitutes over 50% of end to end ViG inference latency, rising to 95% at high image resolutions, making it the dominant computational bottleneck. While hardware acceleration holds promise, prior works primarily optimize graph construction algorithmically, often compromising DIGC flexibility, accuracy, or generality. To address these limitations, we propose a streaming, deeply pipelined FPGA accelerator for DIGC, featuring on chip buffers that process input features in small, uniform blocks. Our design minimizes external memory traffic via localized computation and performs efficient parallel sorting with local merge sort and global k way merging directly on streaming input blocks via heap insertion. This modular architecture scales seamlessly across image resolutions, ViG layer types, and model sizes and variants, and supports DIGC across diverse ViG based vision backbones. The design achieves high clock frequencies post place and route due to the statically configured parallelism minimizing critical path delay and delivers up to 16.6x and 6.8x speedups over optimized CPU and GPU DIGC baselines.

Problem

Research questions and friction points this paper is trying to address.

Accelerating dynamic graph construction bottleneck in Vision GNNs

Reducing computational latency in image patch connectivity

Enabling scalable FPGA acceleration without compromising flexibility

Innovation

Methods, ideas, or system contributions that make the work stand out.

Streaming FPGA accelerator for dynamic image graph construction

On-chip buffers process features in small uniform blocks

Efficient parallel sorting via local merge and global merging

🔎 Similar Papers

Graph is all you need? Lightweight data-agnostic neural architecture search without training