GraphIP-Bench: How Hard Is It to Steal a Graph Neural Network, and Can We Stop It?

📅 2026-05-12

📈 Citations: 0

✨ Influential: 0

career value

191K/year

🤖 AI Summary

Graph Neural Networks (GNNs) deployed in cloud services are vulnerable to model extraction attacks, yet a unified benchmark for evaluating their intellectual property (IP) protection remains absent. This work proposes GraphIP-Bench, the first comprehensive evaluation framework for GNN IP protection that encompasses heterogeneous graphs, large-scale graphs, and multi-task settings. Under a standardized black-box protocol, it systematically assesses 12 attack and 12 defense methods. The framework introduces a novel joint attack-defense watermarking validation track, uncovering defense failures overlooked by single-model evaluations. Experiments reveal that GNNs are highly susceptible to extraction under moderate query budgets, and most defenses fail to provide robust protection. While watermarks remain effective in original models, their strength significantly degrades in stolen replicas. Heterophilic graphs prove harder to extract, and cross-architecture transfer mitigates but does not prevent successful attacks.

📝 Abstract

Graph neural networks (GNNs) deployed as cloud services can be \emph{stolen} through \emph{model-extraction attacks}, which train a surrogate from query responses to reproduce the target's behaviour, and a growing line of ownership defenses tries to prevent or trace such theft. The title of this paper asks two questions: \emph{how hard is it to steal a GNN?}, and \emph{can we stop it?} Prior work cannot answer either, because experiments use inconsistent datasets, threat models, and metrics. We introduce \emph{GraphIP-Bench}, a unified benchmark which evaluates both sides under a single black-box protocol. It integrates twelve extraction attacks, twelve defenses spanning watermarking, output-perturbation, and query-pattern-detection families, ten public graphs covering homophilic, heterophilic, and large-scale regimes, three GNN backbones, and three graph-learning tasks, and it reports fidelity, task utility, ownership verification, and computational cost on shared splits, queries, and budgets. We further add a joint attack-and-defense track which runs every attack on every defended target and measures watermark verification on the resulting surrogate, which exposes the protection that a defense retains after extraction. The empirical picture is short: stealing a GNN is easy at medium query budgets and most defenses do not change this; several watermarks verify reliably on the protected model but lose most of their verification signal on the extracted surrogate, which exposes a gap that single-model evaluations miss; and heterophilic graphs are systematically harder to steal, while a cross-architecture mismatch between target and surrogate reduces but does not prevent extraction. Code: \href{https://github.com/LabRAI/GraphIP-Bench}{LabRAI/GraphIP-Bench}.

Problem

Research questions and friction points this paper is trying to address.

model extraction

graph neural networks

ownership protection

adversarial attacks

black-box attacks

Innovation

Methods, ideas, or system contributions that make the work stand out.

model extraction

graph neural networks

ownership verification