🤖 AI Summary
Existing heterogeneous graph neural networks (HGNNs) neglect the inherent tree-like hierarchical structure among metapaths, limiting their capacity to jointly model structural and semantic information in heterogeneous graphs. To address this, we propose HetTree—the first model to explicitly encode metapath hierarchies via a *semantic tree*, where nodes represent metapaths and edges denote parent–child relationships. HetTree introduces a *subtree attention mechanism* to enable cross-level message passing and semantic enhancement, and a *feature-label alignment strategy* that precomputes and fine-grains metapath-level representations to align with node labels. Evaluated on multiple real-world heterogeneous graph benchmarks, HetTree consistently outperforms state-of-the-art methods, supports efficient training and inference on million-scale graphs, and achieves top performance on open benchmarks.
📝 Abstract
The recent past has seen an increasing interest in Heterogeneous Graph Neural Networks (HGNNs), since many real-world graphs are heterogeneous in nature, from citation graphs to email graphs. However, existing methods ignore a tree hierarchy among metapaths, naturally constituted by different node types and relation types. In this paper, we present HetTree, a novel HGNN that models both the graph structure and heterogeneous aspects in a scalable and effective manner. Specifically, HetTree builds a semantic tree data structure to capture the hierarchy among metapaths. To effectively encode the semantic tree, HetTree uses a novel subtree attention mechanism to emphasize metapaths that are more helpful in encoding parent-child relationships. Moreover, HetTree proposes carefully matching pre-computed features and labels correspondingly, constituting a complete metapath representation. Our evaluation of HetTree on a variety of real-world datasets demonstrates that it outperforms all existing baselines on open benchmarks and efficiently scales to large real-world graphs with millions of nodes and edges.