๐ค AI Summary
In graph learning, conventional graph coarsening methods often induce node feature homogenization and loss of fine-grained structural details. To address this, we propose a hierarchical Graph Transformer architecture that treats node clusters as fundamental tokensโbypassing explicit coarsening and thereby preserving information integrity. Our core innovation is the Node-to-Cluster Attention (N2C-Attn) mechanism, which integrates multi-kernel learning with kernelized attention to enable dynamic, bidirectional coupling between node-level and cluster-level representations. Additionally, we introduce cluster-level message passing and a linear-complexity self-attention design to ensure scalability. Extensive experiments on multiple graph-level benchmark tasks demonstrate significant improvements over state-of-the-art methods, achieving superior modeling expressiveness without compromising computational efficiency. The implementation is publicly available.
๐ Abstract
In the realm of graph learning, there is a category of methods that conceptualize graphs as hierarchical structures, utilizing node clustering to capture broader structural information. While generally effective, these methods often rely on a fixed graph coarsening routine, leading to overly homogeneous cluster representations and loss of node-level information. In this paper, we envision the graph as a network of interconnected node sets without compressing each cluster into a single embedding. To enable effective information transfer among these node sets, we propose the Node-to-Cluster Attention (N2C-Attn) mechanism. N2C-Attn incorporates techniques from Multiple Kernel Learning into the kernelized attention framework, effectively capturing information at both node and cluster levels. We then devise an efficient form for N2C-Attn using the cluster-wise message-passing framework, achieving linear time complexity. We further analyze how N2C-Attn combines bi-level feature maps of queries and keys, demonstrating its capability to merge dual-granularity information. The resulting architecture, Cluster-wise Graph Transformer (Cluster-GT), which uses node clusters as tokens and employs our proposed N2C-Attn module, shows superior performance on various graph-level tasks. Code is available at https://github.com/LUMIA-Group/Cluster-wise-Graph-Transformer.