🤖 AI Summary
Graph Domain Adaptation (GDA) confronts dual challenges: severe distribution shift between source and target domains, and limited computational resources. To address this, we propose GRADATE—the first model-agnostic graph data selection framework designed to identify source-domain samples with maximal transferability for target-domain classification. GRADATE models graph distribution discrepancies via optimal transport theory, integrating structural statistical distances with unsupervised sample importance scoring—entirely bypassing GNN pretraining, inference, or fine-tuning. It achieves high data efficiency, strong scalability, and broad compatibility with existing GDA methods. Extensive experiments across diverse real-world graph datasets and multiple covariate shift settings demonstrate that GRADATE consistently outperforms state-of-the-art data selection approaches. Moreover, it boosts the performance of mainstream GDA models using significantly fewer source samples.
📝 Abstract
Graph domain adaptation (GDA) is a fundamental task in graph machine learning, with techniques like shift-robust graph neural networks (GNNs) and specialized training procedures to tackle the distribution shift problem. Although these model-centric approaches show promising results, they often struggle with severe shifts and constrained computational resources. To address these challenges, we propose a novel model-free framework, GRADATE (GRAph DATa sElector), that selects the best training data from the source domain for the classification task on the target domain. GRADATE picks training samples without relying on any GNN model's predictions or training recipes, leveraging optimal transport theory to capture and adapt to distribution changes. GRADATE is data-efficient, scalable and meanwhile complements existing model-centric GDA approaches. Through comprehensive empirical studies on several real-world graph-level datasets and multiple covariate shift types, we demonstrate that GRADATE outperforms existing selection methods and enhances off-the-shelf GDA methods with much fewer training data.