Intellectual Property in Graph-Based Machine Learning as a Service: Attacks and Defenses

📅 2025-08-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Graph machine learning as a service (GMLaaS) faces API-layer intellectual property (IP) threats—including model extraction, membership inference, and training data leakage—targeting both graph models and structural graph data. Method: This work introduces the first fine-grained IP threat and defense taxonomy tailored to graph machine learning (GML). We propose a systematic security evaluation framework, release a cross-domain benchmark dataset, and open-source PyGIP—a comprehensive evaluation library supporting over ten techniques, including graph neural networks, black-box attack analysis, model watermarking, and graph data desensitization. Contribution/Results: Evaluated on multiple real-world graph datasets, our defense strategies significantly enhance IP security for both models and graph data in open cloud environments. PyGIP enables standardized, reproducible experimentation and validation, advancing规范化 and quantifiable IP protection in GML.

Technology Category

Application Category

📝 Abstract
Graph-structured data, which captures non-Euclidean relationships and interactions between entities, is growing in scale and complexity. As a result, training state-of-the-art graph machine learning (GML) models have become increasingly resource-intensive, turning these models and data into invaluable Intellectual Property (IP). To address the resource-intensive nature of model training, graph-based Machine-Learning-as-a-Service (GMLaaS) has emerged as an efficient solution by leveraging third-party cloud services for model development and management. However, deploying such models in GMLaaS also exposes them to potential threats from attackers. Specifically, while the APIs within a GMLaaS system provide interfaces for users to query the model and receive outputs, they also allow attackers to exploit and steal model functionalities or sensitive training data, posing severe threats to the safety of these GML models and the underlying graph data. To address these challenges, this survey systematically introduces the first taxonomy of threats and defenses at the level of both GML model and graph-structured data. Such a tailored taxonomy facilitates an in-depth understanding of GML IP protection. Furthermore, we present a systematic evaluation framework to assess the effectiveness of IP protection methods, introduce a curated set of benchmark datasets across various domains, and discuss their application scopes and future challenges. Finally, we establish an open-sourced versatile library named PyGIP, which evaluates various attack and defense techniques in GMLaaS scenarios and facilitates the implementation of existing benchmark methods. The library resource can be accessed at: https://labrai.github.io/PyGIP. We believe this survey will play a fundamental role in intellectual property protection for GML and provide practical recipes for the GML community.
Problem

Research questions and friction points this paper is trying to address.

Protecting GML models from API-based attacks in MLaaS
Safeguarding graph-structured training data from theft
Systematically categorizing threats and defenses for GML IP
Innovation

Methods, ideas, or system contributions that make the work stand out.

Proposes first taxonomy for GML threats and defenses
Introduces systematic evaluation framework for IP protection
Develops open-source library PyGIP for attack-defense evaluation