🤖 AI Summary
This work addresses the problem of Graph Neural Network (GNN) model stealing under extremely limited query budgets (e.g., ≤100 queries) and in the presence of common defenses. We propose the first black-box attack framework that reconstructs the victim GNN’s backbone architecture *without direct architectural queries*. Our method first infers the underlying architecture by leveraging output similarity patterns and graph-topological priors; it then employs an active learning strategy combined with adversarially robust graph sample selection to dynamically allocate scarce queries toward maximally discriminative graphs. Evaluated on eight real-world graph datasets, our approach achieves >92% architectural recovery accuracy and >89% functional equivalence using only 50–100 queries. Crucially, it remains effective against mainstream defenses—including output smoothing and gradient masking—demonstrating a substantial breakthrough in query efficiency over prior GNN model stealing methods.
📝 Abstract
Current graph neural network (GNN) model-stealing methods rely heavily on queries to the victim model, assuming no hard query limits. However, in reality, the number of allowed queries can be severely limited. In this paper, we demonstrate how an adversary can extract a GNN with very limited interactions with the model. Our approach first enables the adversary to obtain the model backbone without making direct queries to the victim model and then to strategically utilize a fixed query limit to extract the most informative data. The experiments on eight real-world datasets demonstrate the effectiveness of the attack, even under a very restricted query limit and under defense against model extraction in place. Our findings underscore the need for robust defenses against GNN model extraction threats.