🤖 AI Summary
Existing graph learning research predominantly focuses on either homogeneous or heterogeneous graphs, overlooking real-world complex graphs that simultaneously exhibit both heterogeneity (diverse node/edge types) and disassortativity (label dissimilarity among adjacent nodes), and lacks systematic benchmarking. This work introduces H2GB, the first benchmark for heterogeneous and disassortative graph learning, comprising nine cross-domain real-world graphs, 28 baseline models, and 26 evaluation results. We formally define this task and propose UnifiedGT, a unified Graph Transformer framework, along with its enhanced variant H2G-former—incorporating masked label embeddings, cross-type heterogeneous attention, and type-specific feed-forward networks to jointly model dual complexity. Experiments demonstrate that H2G-former achieves an average accuracy improvement of 5.2% over state-of-the-art methods. Both code and datasets are publicly released to foster standardization and advancement in the field.
📝 Abstract
Many real-world graphs frequently present challenges for graph learning due to the presence of both heterophily and heterogeneity. However, existing benchmarks for graph learning often focus on heterogeneous graphs with homophily or homogeneous graphs with heterophily, leaving a gap in understanding how methods perform on graphs that are both heterogeneous and heterophilic. To bridge this gap, we introduce H2GB, a novel graph benchmark that brings together the complexities of both the heterophily and heterogeneity properties of graphs. Our benchmark encompasses 9 diverse real-world datasets across 5 domains, 28 baseline model implementations, and 26 benchmark results. In addition, we present a modular graph transformer framework UnifiedGT and a new model variant, H2G-former, that excels at this challenging benchmark. By integrating masked label embeddings, cross-type heterogeneous attention, and type-specific FFNs, H2G-former effectively tackles graph heterophily and heterogeneity. Extensive experiments across 26 baselines on H2GB reveal inadequacies of current models on heterogeneous heterophilic graph learning, and demonstrate the superiority of our H2G-former over existing solutions. Both the benchmark and the framework are available on GitHub (https://github.com/junhongmit/H2GB) and PyPI (https://pypi.org/project/H2GB), and documentation can be found at https://junhongmit.github.io/H2GB/.