🤖 AI Summary
Graph machine learning has long suffered from benchmark fragmentation: datasets are task-specific, evaluation protocols lack standardization, and out-of-distribution (OOD) generalization is rarely considered—severely hindering reproducibility and cross-model comparison. To address this, we introduce GraphBench, the first cross-domain, multi-task graph learning benchmark platform, supporting node-, edge-, and graph-level classification as well as generative tasks. GraphBench features standardized data splits, a unified evaluation protocol, an automated hyperparameter tuning framework, and—uniquely—integrates OOD generalization metrics into its core evaluation suite. We establish authoritative baselines using message-passing GNNs and graph Transformers, conducting systematic evaluations across 12 diverse datasets. GraphBench significantly improves evaluation consistency and result comparability, providing a reproducible, scalable, and standardized infrastructure for graph learning research.
📝 Abstract
Machine learning on graphs has recently achieved impressive progress in various domains, including molecular property prediction and chip design. However, benchmarking practices remain fragmented, often relying on narrow, task-specific datasets and inconsistent evaluation protocols, which hampers reproducibility and broader progress. To address this, we introduce GraphBench, a comprehensive benchmarking suite that spans diverse domains and prediction tasks, including node-level, edge-level, graph-level, and generative settings. GraphBench provides standardized evaluation protocols -- with consistent dataset splits and performance metrics that account for out-of-distribution generalization -- as well as a unified hyperparameter tuning framework. Additionally, we benchmark GraphBench using message-passing neural networks and graph transformer models, providing principled baselines and establishing a reference performance. See www.graphbench.io for further details.