Performance Heterogeneity in Graph Neural Networks: Lessons for Architecture Design and Preprocessing

📅 2025-03-01

📈 Citations: 0

✨ Influential: 0

career value

235K/year

🤖 AI Summary

This paper identifies and quantifies a significant performance heterogeneity phenomenon in Graph Neural Networks (GNNs) for graph-level classification and regression: identical models exhibit substantial performance variance across individual graph samples, unexplained solely by topological differences. To address this, we propose a heterogeneity measurement framework based on Tree Mover’s Distance (TMD), the first to jointly model graph topology and node feature distributions. We further design a data-aware selective rewiring strategy and a spectral-adaptive depth selection mechanism. Experiments demonstrate that our approach improves average graph classification accuracy by 2.3% and substantially reduces hyperparameter tuning overhead. We empirically validate that performance heterogeneity strongly correlates with inter-class distance ratios. Moreover, our spectral-driven depth heuristic achieves performance on par with manually optimized layer counts across multiple benchmarks.

Technology Category

Application Category

📝 Abstract

Graph Neural Networks have emerged as the most popular architecture for graph-level learning, including graph classification and regression tasks, which frequently arise in areas such as biochemistry and drug discovery. Achieving good performance in practice requires careful model design. Due to gaps in our understanding of the relationship between model and data characteristics, this often requires manual architecture and hyperparameter tuning. This is particularly pronounced in graph-level tasks, due to much higher variation in the input data than in node-level tasks. To work towards closing these gaps, we begin with a systematic analysis of individual performance in graph-level tasks. Our results establish significant performance heterogeneity in both message-passing and transformer-based architectures. We then investigate the interplay of model and data characteristics as drivers of the observed heterogeneity. Our results suggest that graph topology alone cannot explain heterogeneity. Using the Tree Mover's Distance, which jointly evaluates topological and feature information, we establish a link between class-distance ratios and performance heterogeneity in graph classification. These insights motivate model and data preprocessing choices that account for heterogeneity between graphs. We propose a selective rewiring approach, which only targets graphs whose individual performance benefits from rewiring. We further show that the optimal network depth depends on the graph's spectrum, which motivates a heuristic for choosing the number of GNN layers. Our experiments demonstrate the utility of both design choices in practice.

Problem

Research questions and friction points this paper is trying to address.

Analyzes performance heterogeneity in Graph Neural Networks.

Investigates model-data interplay affecting graph-level task performance.

Proposes selective rewiring and heuristic for GNN layer depth.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Selective rewiring targets performance-beneficial graphs

Optimal GNN depth linked to graph spectrum

Tree Mover's Distance evaluates topology and features

🔎 Similar Papers

Survey on Characterizing and Understanding GNNs from a Computer Architecture Perspective