Model-Free Graph Data Selection under Distribution Shift

📅 2025-05-22

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Graph Domain Adaptation (GDA) confronts dual challenges: severe distribution shift between source and target domains, and limited computational resources. To address this, we propose GRADATE—the first model-agnostic graph data selection framework designed to identify source-domain samples with maximal transferability for target-domain classification. GRADATE models graph distribution discrepancies via optimal transport theory, integrating structural statistical distances with unsupervised sample importance scoring—entirely bypassing GNN pretraining, inference, or fine-tuning. It achieves high data efficiency, strong scalability, and broad compatibility with existing GDA methods. Extensive experiments across diverse real-world graph datasets and multiple covariate shift settings demonstrate that GRADATE consistently outperforms state-of-the-art data selection approaches. Moreover, it boosts the performance of mainstream GDA models using significantly fewer source samples.

Technology Category

Application Category

📝 Abstract

Graph domain adaptation (GDA) is a fundamental task in graph machine learning, with techniques like shift-robust graph neural networks (GNNs) and specialized training procedures to tackle the distribution shift problem. Although these model-centric approaches show promising results, they often struggle with severe shifts and constrained computational resources. To address these challenges, we propose a novel model-free framework, GRADATE (GRAph DATa sElector), that selects the best training data from the source domain for the classification task on the target domain. GRADATE picks training samples without relying on any GNN model's predictions or training recipes, leveraging optimal transport theory to capture and adapt to distribution changes. GRADATE is data-efficient, scalable and meanwhile complements existing model-centric GDA approaches. Through comprehensive empirical studies on several real-world graph-level datasets and multiple covariate shift types, we demonstrate that GRADATE outperforms existing selection methods and enhances off-the-shelf GDA methods with much fewer training data.

Problem

Research questions and friction points this paper is trying to address.

Selects optimal source domain data for target domain classification

Addresses distribution shift without relying on GNN models

Improves graph domain adaptation efficiency and scalability

Innovation

Methods, ideas, or system contributions that make the work stand out.

Model-free framework for graph data selection

Uses optimal transport theory for adaptation

Enhances GDA methods with fewer data

🔎 Similar Papers

No similar papers found.

Authors to Follow