đ¤ AI Summary
This work addresses the lack of systematic evaluation of graph neural networksâ inductive generalization capability on entirely unseen graph structures. To this end, we propose the first large-scale graph generation framework specifically designed for inductive learning benchmarking. The framework synthesizes semantically coherent, community-persistent graph families, enabling fine-grained control over critical structural propertiesâincluding homophily and degree distributionâand controllable injection of distributional shifts to assess robustness. Leveraging this framework, we conduct a comprehensive benchmark study across mainstream architectures: GNNs, Graph Transformers, and topology-aware models. Our results reveal a weak correlation between transductive performance and inductive generalization ability; further, model robustness is found to be highly sensitive to initial graph structural characteristics. These findings uncover a strong architectural-structural couplingâi.e., deep interdependence between model design and underlying graph topologyâhighlighting fundamental limitations in current graph representation learning paradigms.
đ Abstract
A fundamental challenge in graph learning is understanding how models generalize to new, unseen graphs. While synthetic benchmarks offer controlled settings for analysis, existing approaches are confined to single-graph, transductive settings where models train and test on the same graph structure. Addressing this gap, we introduce GraphUniverse, a framework for generating entire families of graphs to enable the first systematic evaluation of inductive generalization at scale. Our core innovation is the generation of graphs with persistent semantic communities, ensuring conceptual consistency while allowing fine-grained control over structural properties like homophily and degree distributions. This enables crucial but underexplored robustness tests, such as performance under controlled distribution shifts. Benchmarking a wide range of architectures -- from GNNs to graph transformers and topological architectures -- reveals that strong transductive performance is a poor predictor of inductive generalization. Furthermore, we find that robustness to distribution shift is highly sensitive not only to model architecture choice but also to the initial graph regime (e.g., high vs. low homophily). Beyond benchmarking, GraphUniverse's flexibility and scalability can facilitate the development of robust and truly generalizable architectures -- including next-generation graph foundation models. An interactive demo is available at https://graphuniverse.streamlit.app.