🤖 AI Summary
This paper addresses feature redundancy in node representations within graph neural networks (GNNs) by proposing an adaptive, interpretable feature selection method. The approach dynamically evaluates node feature importance during training: it measures performance degradation on the validation set under interventional feature perturbations, theoretically models the coupling between GNN performance, node features, and graph structure, and progressively prunes irrelevant features based on their evolving correlation trajectories. Its core innovation lies in establishing a model- and task-agnostic online feature selection framework compatible with end-to-end joint optimization. Extensive experiments across multiple GNN architectures—including GCN, GAT, and GIN—on real-world graph datasets demonstrate that the method consistently improves classification accuracy (average gain of +1.2%), reduces input dimensionality (up to 40% reduction), and enhances model interpretability.
📝 Abstract
We propose an adaptive node feature selection approach for graph neural networks (GNNs) that identifies and removes unnecessary features during training. The ability to measure how features contribute to model output is key for interpreting decisions, reducing dimensionality, and even improving performance by eliminating unhelpful variables. However, graph-structured data introduces complex dependencies that may not be amenable to classical feature importance metrics. Inspired by this challenge, we present a model- and task-agnostic method that determines relevant features during training based on changes in validation performance upon permuting feature values. We theoretically motivate our intervention-based approach by characterizing how GNN performance depends on the relationships between node data and graph structure. Not only do we return feature importance scores once training concludes, we also track how relevance evolves as features are successively dropped. We can therefore monitor if features are eliminated effectively and also evaluate other metrics with this technique. Our empirical results verify the flexibility of our approach to different graph architectures as well as its adaptability to more challenging graph learning settings.