FlexVector: A SpMM Vector Processor with Flexible VRF for GCNs on Varying-Sparsity Graphs

📅 2026-04-11
📈 Citations: 0
Influential: 0
📄 PDF

career value

244K/year
🤖 AI Summary
Graph Convolutional Network (GCN) inference suffers from irregular sparse-dense matrix multiplication (SpMM) due to power-law node degree distributions, which existing accelerators struggle to handle efficiently. This work proposes FlexVector, a vector processor architecture that integrates a row-wise product-based SpMM dataflow with a software-managed, flexible vector register file (VRF), eschewing multi-bank designs to maintain memory efficiency while accommodating irregular memory access patterns. Coupled with graph-aware preprocessing and node partitioning strategies, FlexVector enables synergistic hardware-software co-optimization. Evaluated on five real-world GCN datasets, FlexVector achieves a 3.78× speedup and 40.5% energy reduction over a state-of-the-art cache-centric baseline of comparable area.

Technology Category

Application Category

📝 Abstract
Graph Convolutional Networks (GCNs) are widely adopted for tasks involving relational or graph-structured data and can be formulated as two-stage sparse-dense matrix multiplication (SpMM) during inference. However, existing accelerators often struggle with the irregular workloads induced by power-law node degree distributions. In this work, we propose FlexVector, a vector-processor-based architecture that efficiently accelerates SpMM for GCN inference. To address irregular computation patterns, FlexVector adopts a row-wise, product-based dataflow that regularizes SpMM execution and exposes vector parallelism through full-row access to vector registers, eliminating the need for multi-banked register file designs. Building on this dataflow, it introduces software-managed, flexible vector register files (VRFs) that adapt to irregular data access patterns, without sacrificing memory access efficiency. To further exploit these architectural capabilities, we develop a graph-aware preprocessing and node partitioning strategy that restructures irregular graph workloads to better match the row-wise dataflow and VRF capacity. This hardware-software co-design reduces memory traffic, leading to significant performance and energy efficiency gains on real-world GCN workloads. Experimental results on five real-world GCN datasets show that the VRF-centric FlexVector achieves a 3.78x speedup and 40.5% lower energy at comparable area cost relative to a state-of-the-art cache-centric baseline with buffers of the same size.
Problem

Research questions and friction points this paper is trying to address.

Graph Convolutional Networks
SpMM
irregular workloads
power-law degree distribution
sparse-dense matrix multiplication
Innovation

Methods, ideas, or system contributions that make the work stand out.

SpMM
vector processor
flexible VRF
GCN acceleration
hardware-software co-design