🤖 AI Summary
This work addresses the limitation of existing methods that model semantics only at the token level when processing textual attributed graphs, thereby neglecting structural dependencies among node texts. To overcome this, the authors propose a structure-aware dual-granularity text encoder that cascades two pretrained language models to sequentially capture semantics at both token and node levels. Crucially, they introduce a dynamic mask within the self-attention mechanism, constrained by graph topology, to differentiate the contributions of central nodes and neighboring contexts to token importance. This approach represents the first effort to integrate graph topological information directly into the attention mechanism of language models, enabling structure-aware, fine-grained semantic encoding. The method achieves state-of-the-art performance, significantly outperforming existing approaches across multiple benchmark datasets.
📝 Abstract
Text-attributed graphs integrate semantic information of node texts with topological structure, offering significant value in various applications such as document classification and information extraction. Existing approaches typically encode textual content using language models (LMs), followed by graph neural networks (GNNs) to process structural information. However, during the LM-based text encoding phase, most methods not only perform semantic interaction solely at the word-token granularity, but also neglect the structural dependencies among texts from different nodes. In this work, we propose DuConTE, a dual-granularity text encoder with topology-constrained attention. The model employs a cascaded architecture of two pretrained LMs, encoding semantics first at the word-token granularity and then at the node granularity. During the self-attention computation in each LM, we dynamically adjust the attention mask matrix based on node connectivity, guiding the model to learn semantic correlations informed by the graph structure. Furthermore, when composing node representations from word-token embeddings, we separately evaluate the importance of tokens under the center-node context and the neighborhood context, enabling the capture of more contextually relevant semantic information. Extensive experiments on multiple benchmark datasets demonstrate that DuConTE achieves state-of-the-art performance on the majority of them.