When Heterophily Meets Heterogeneity: New Graph Benchmarks and Effective Methods

📅 2024-07-15

🏛️ arXiv.org

📈 Citations: 1

✨ Influential: 0

career value

204K/year

🤖 AI Summary

Existing graph learning research predominantly focuses on either homogeneous or heterogeneous graphs, overlooking real-world complex graphs that simultaneously exhibit both heterogeneity (diverse node/edge types) and disassortativity (label dissimilarity among adjacent nodes), and lacks systematic benchmarking. This work introduces H2GB, the first benchmark for heterogeneous and disassortative graph learning, comprising nine cross-domain real-world graphs, 28 baseline models, and 26 evaluation results. We formally define this task and propose UnifiedGT, a unified Graph Transformer framework, along with its enhanced variant H2G-former—incorporating masked label embeddings, cross-type heterogeneous attention, and type-specific feed-forward networks to jointly model dual complexity. Experiments demonstrate that H2G-former achieves an average accuracy improvement of 5.2% over state-of-the-art methods. Both code and datasets are publicly released to foster standardization and advancement in the field.

Technology Category

Application Category

📝 Abstract

Many real-world graphs frequently present challenges for graph learning due to the presence of both heterophily and heterogeneity. However, existing benchmarks for graph learning often focus on heterogeneous graphs with homophily or homogeneous graphs with heterophily, leaving a gap in understanding how methods perform on graphs that are both heterogeneous and heterophilic. To bridge this gap, we introduce H2GB, a novel graph benchmark that brings together the complexities of both the heterophily and heterogeneity properties of graphs. Our benchmark encompasses 9 diverse real-world datasets across 5 domains, 28 baseline model implementations, and 26 benchmark results. In addition, we present a modular graph transformer framework UnifiedGT and a new model variant, H2G-former, that excels at this challenging benchmark. By integrating masked label embeddings, cross-type heterogeneous attention, and type-specific FFNs, H2G-former effectively tackles graph heterophily and heterogeneity. Extensive experiments across 26 baselines on H2GB reveal inadequacies of current models on heterogeneous heterophilic graph learning, and demonstrate the superiority of our H2G-former over existing solutions. Both the benchmark and the framework are available on GitHub (https://github.com/junhongmit/H2GB) and PyPI (https://pypi.org/project/H2GB), and documentation can be found at https://junhongmit.github.io/H2GB/.

Problem

Research questions and friction points this paper is trying to address.

Addressing performance gaps in models for heterophilic and heterogeneous graphs

Introducing H2GB benchmark for real-world graph complexities

Evaluating 28 baselines to highlight current method limitations

Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces H2GB benchmark for heterophilic heterogeneous graphs

Standardized workflow with unified modeling framework

Proposes H2G-former model for improved performance

🔎 Similar Papers

Re-evaluating the Advancements of Heterophilic Graph Learning