On the Safety of Graph Representation Learning

📅 2026-05-07

📈 Citations: 0

✨ Influential: 0

career value

172K/year

🤖 AI Summary

Existing graph representation learning methods lack systematic evaluation of their reliability under deployment pressures such as data perturbations, distribution shifts, and class imbalance. This work proposes GRL-Safety, the first multidimensional safety benchmark encompassing corruption robustness, out-of-distribution generalization, class imbalance, fairness, and interpretability. We conduct standardized evaluations of 12 representative methods—including topological embeddings, supervised GNNs, self-supervised models, and graph foundation models—across 25 datasets. Our findings reveal that graph foundation models excel only in specific safety dimensions, with overall performance constrained by the interplay between representation design and graph structure. The results highlight the inadequacy of current approaches in handling composite deployment challenges, underscoring the urgent need for novel training and adaptation paradigms that go beyond model selection.

📝 Abstract

Graph representation learning (GRL) has evolved from topology-only graph embeddings to task-specific supervised GNNs, and more recently to reusable representations and graph foundation models (GFMs). However, existing evaluations mainly measure clean transfer, adaptation, and task coverage. It remains unclear whether GRL methods stay reliable when deployment stresses affect graph signals, graph contexts, label support, structural groups, or predictive evidence. We introduce GRL-Safety, a multi-axis safety evaluation benchmark for GRL. GRL-Safety evaluates twelve representative methods, spanning topology-only embedding methods, supervised GNNs, self-supervised graph models, and GFMs, on twenty-five graph datasets under standardized evaluation conditions while preserving method-native adaptation. The evaluation covers five safety axes: corruption robustness, OOD generalization, class imbalance, fairness, and interpretation, with per-axis and sub-condition reporting rather than a single aggregate score. Our analysis yields three cross-axis insights that can inspire future research. First, safety behavior is shaped by the interaction between representation design and the stressed graph factor, rather than by method family alone. Second, foundation-era methods show axis-specific strengths rather than broad safety dominance. Third, several deployment regimes remain difficult even for the best evaluated method, revealing capability gaps that require new robustness, adaptation, or training objectives beyond model selection. The benchmark, evaluation protocols, and code are available at: https://github.com/GXG-CS/GRL-Safety.

Problem

Research questions and friction points this paper is trying to address.

Graph Representation Learning

Safety Evaluation

Robustness

Out-of-Distribution Generalization

Fairness

Innovation

Methods, ideas, or system contributions that make the work stand out.

Graph Representation Learning

Safety Benchmark

Graph Foundation Models