🤖 AI Summary
Existing graph learning benchmarks predominantly consist of small-scale graphs and emphasize inductive learning, making them inadequate for modeling long-range dependencies; moreover, mainstream models—such as GNNs and Graph Transformers—lack interpretable, quantitative metrics to assess long-range interactions. Method: We introduce (1) City-Networks, a large-scale real-world urban road network dataset (>10⁵ nodes, high diameter), with an eccentricity-labeling task explicitly designed to incentivize long-range modeling; (2) a model-agnostic multi-hop neighbor Jacobian influence metric, enabling the first theoretically grounded, quantitative characterization of long-range information flow in GNNs and revealing over-smoothing and influence dilution mechanisms. Contribution/Results: Empirical evaluation exposes fundamental performance bottlenecks of state-of-the-art GNNs on long-range tasks; we provide the first large-graph transductive benchmark supporting rigorous quantitative analysis of long-range dependency, advancing graph representation learning from local aggregation toward global structural modeling.
📝 Abstract
Long-range dependencies are critical for effective graph representation learning, yet most existing datasets focus on small graphs tailored to inductive tasks, offering limited insight into long-range interactions. Current evaluations primarily compare models employing global attention (e.g., graph transformers) with those using local neighborhood aggregation (e.g., message-passing neural networks) without a direct measurement of long-range dependency. In this work, we introduce City-Networks, a novel large-scale transductive learning dataset derived from real-world city roads. This dataset features graphs with over $10^5$ nodes and significantly larger diameters than those in existing benchmarks, naturally embodying long-range information. We annotate the graphs using an eccentricity-based approach, ensuring that the classification task inherently requires information from distant nodes. Furthermore, we propose a model-agnostic measurement based on the Jacobians of neighbors from distant hops, offering a principled quantification of long-range dependencies. Finally, we provide theoretical justifications for both our dataset design and the proposed measurement - particularly by focusing on over-smoothing and influence score dilution - which establishes a robust foundation for further exploration of long-range interactions in graph neural networks.