TLV-HGNN: Thinking Like a Vertex for Memory-efficient HGNN Inference

📅 2025-08-11

📈 Citations: 0

✨ Influential: 0

career value

205K/year

🤖 AI Summary

Heterogeneous Graph Neural Network (HGNN) inference suffers from low memory efficiency during neighbor aggregation—caused by semantic-separated execution (inducing redundant intermediate state storage) and overlapping neighborhoods across semantics (leading to repeated feature loading and neighbor access). Method: We propose the “Semantically Complete Execution Paradigm,” which restructures computation from a vertex-centric perspective to eliminate intermediate storage and fuse shared neighbor accesses. We further design a vertex-grouping strategy to jointly optimize memory access locality and build TVL-HGNN, a reconfigurable hardware accelerator. Results: Experiments show that TVL-HGNN achieves 7.85× and 1.41× average speedup over NVIDIA A100 GPU and the state-of-the-art accelerator HiHGNN, respectively, while reducing energy consumption by 98.79% and 32.61%. The approach significantly enhances scalability and energy efficiency of HGNN inference.

Technology Category

Application Category

📝 Abstract

Heterogeneous graph neural networks (HGNNs) excel at processing heterogeneous graph data and are widely applied in critical domains. In HGNN inference, the neighbor aggregation stage is the primary performance determinant, yet it suffers from two major sources of memory inefficiency. First, the commonly adopted per-semantic execution paradigm stores intermediate aggregation results for each semantic prior to semantic fusion, causing substantial memory expansion. Second, the aggregation process incurs extensive redundant memory accesses, including repeated loading of target vertex features across semantics and repeated accesses to shared neighbors due to cross-semantic neighborhood overlap. These inefficiencies severely limit scalability and reduce HGNN inference performance. In this work, we first propose a semantics-complete execution paradigm from a vertex perspective that eliminates per-semantic intermediate storage and redundant target vertex accesses. Building on this paradigm, we design TVL-HGNN, a reconfigurable hardware accelerator optimized for efficient aggregation. In addition, we introduce a vertex grouping technique based on cross-semantic neighborhood overlap, with hardware implementation, to reduce redundant accesses to shared neighbors. Experimental results demonstrate that TVL-HGNN achieves average speedups of 7.85x and 1.41x over the NVIDIA A100 GPU and the state-of-the-art HGNN accelerator HiHGNN, respectively, while reducing energy consumption by 98.79% and 32.61%.

Problem

Research questions and friction points this paper is trying to address.

Memory inefficiency in HGNN neighbor aggregation stage

Redundant memory accesses due to repeated feature loading

Storage expansion from per-semantic intermediate results

Innovation

Methods, ideas, or system contributions that make the work stand out.

Semantics-complete execution paradigm eliminates intermediate storage

Reconfigurable hardware accelerator optimizes efficient aggregation

Vertex grouping technique reduces redundant neighbor accesses

🔎 Similar Papers

Survey on Characterizing and Understanding GNNs from a Computer Architecture Perspective