Shapley-Guided Utility Learning for Effective Graph Inference Data Valuation

📅 2025-03-23

📈 Citations: 0

✨ Influential: 0

career value

153K/year

🤖 AI Summary

GNN neighbor importance assessment during inference is fundamentally hindered by the absence of ground-truth labels, impeding reliable quantification of node-level data value. To address this, we propose the first unsupervised Shapley value prediction framework: it treats Shapley values as target regression outputs and learns their approximations end-to-end using precomputed features. Our method jointly encodes both data-specific characteristics (e.g., graph topology and node attributes) and model-specific properties (e.g., GNN architecture and learned parameters), thereby overcoming the limitations of prior indirect optimization approaches—namely, poor generalizability and suboptimal calibration. Extensive experiments on multiple benchmark graph datasets demonstrate that our approach significantly outperforms existing methods in both transductive and inductive settings, achieving more accurate neighbor importance ranking and yielding consistent improvements in downstream task performance.

Technology Category

Application Category

📝 Abstract

Graph Neural Networks (GNNs) have demonstrated remarkable performance in various graph-based machine learning tasks, yet evaluating the importance of neighbors of testing nodes remains largely unexplored due to the challenge of assessing data importance without test labels. To address this gap, we propose Shapley-Guided Utility Learning (SGUL), a novel framework for graph inference data valuation. SGUL innovatively combines transferable data-specific and modelspecific features to approximate test accuracy without relying on ground truth labels. By incorporating Shapley values as a preprocessing step and using feature Shapley values as input, our method enables direct optimization of Shapley value prediction while reducing computational demands. SGUL overcomes key limitations of existing methods, including poor generalization to unseen test-time structures and indirect optimization. Experiments on diverse graph datasets demonstrate that SGUL consistently outperforms existing baselines in both inductive and transductive settings. SGUL offers an effective, efficient, and interpretable approach for quantifying the value of test-time neighbors.

Problem

Research questions and friction points this paper is trying to address.

Evaluating neighbor importance in GNNs without test labels

Approximating test accuracy using transferable and model-specific features

Optimizing Shapley value prediction for efficient data valuation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Combines transferable data and model features

Uses Shapley values for preprocessing optimization

Directly optimizes Shapley value prediction

🔎 Similar Papers

No similar papers found.