No Metric to Rule Them All: Toward Principled Evaluations of Graph-Learning Datasets

📅 2025-02-04

📈 Citations: 0

✨ Influential: 0

career value

216K/year

🤖 AI Summary

This paper addresses the long-standing absence of principled quality assessment criteria for graph datasets in graph learning. We propose RINGS, the first first-principles-based framework for evaluating graph dataset quality. Methodologically, RINGS introduces dual-modal perturbations—structural and feature-level—to quantify dataset discriminative power via two orthogonal metrics: *performance separability* and *modality complementarity*. Our contributions are threefold: (1) we establish the first systematic, dataset-centric evaluation paradigm—distinct from model-centric benchmarks; (2) we design novel diagnostic tools, including modality ablation, dual-modal sensitivity analysis, benchmark robustness diagnosis, and controlled degradation experiments; and (3) empirical evaluation across 12 mainstream graph datasets reveals pervasive structural redundancy and feature-dominant bias in several widely used benchmarks. Based on these findings, we provide actionable recommendations for dataset curation, design, and quality assurance.

Technology Category

Application Category

📝 Abstract

Benchmark datasets have proved pivotal to the success of graph learning, and good benchmark datasets are crucial to guide the development of the field. Recent research has highlighted problems with graph-learning datasets and benchmarking practices -- revealing, for example, that methods which ignore the graph structure can outperform graph-based approaches on popular benchmark datasets. Such findings raise two questions: (1) What makes a good graph-learning dataset, and (2) how can we evaluate dataset quality in graph learning? Our work addresses these questions. As the classic evaluation setup uses datasets to evaluate models, it does not apply to dataset evaluation. Hence, we start from first principles. Observing that graph-learning datasets uniquely combine two modes -- the graph structure and the node features -- , we introduce RINGS, a flexible and extensible mode-perturbation framework to assess the quality of graph-learning datasets based on dataset ablations -- i.e., by quantifying differences between the original dataset and its perturbed representations. Within this framework, we propose two measures -- performance separability and mode complementarity -- as evaluation tools, each assessing, from a distinct angle, the capacity of a graph dataset to benchmark the power and efficacy of graph-learning methods. We demonstrate the utility of our framework for graph-learning dataset evaluation in an extensive set of experiments and derive actionable recommendations for improving the evaluation of graph-learning methods. Our work opens new research directions in data-centric graph learning, and it constitutes a first step toward the systematic evaluation of evaluations.

Problem

Research questions and friction points this paper is trying to address.

Defining quality in graph-learning datasets

Evaluating dataset effectiveness for graph methods

Introducing RINGS framework for dataset assessment

Innovation

Methods, ideas, or system contributions that make the work stand out.

RINGS framework for dataset evaluation

Performance separability measure introduced

Mode complementarity measure proposed

🔎 Similar Papers

Re-evaluating the Advancements of Heterophilic Graph Learning