🤖 AI Summary
Existing temporal graph neural networks (TGNNs) struggle to jointly model the co-evolving textual semantics and graph topology in temporal text-attributed graphs (TTAGs), typically resorting to static text encoding and over-relying on structural information while neglecting semantic-structural co-evolution. To address this, we propose an LLM-driven temporal semantic extractor and a semantic-structural co-encoder that enable bidirectional enhancement and dynamic alignment between textual semantics and graph topology. Our method synergistically integrates the fine-grained semantic understanding capability of large language models (LLMs) with the structural modeling strength of TGNNs, incorporating dynamic text neighborhood modeling and joint embedding mechanisms. Evaluated on four public benchmarks and one industrial dataset, our approach consistently outperforms state-of-the-art methods, demonstrating superior effectiveness, generalizability, and robustness of semantic-structural co-modeling for TTAGs.
📝 Abstract
Temporal graph neural networks (TGNNs) have shown remarkable performance in temporal graph modeling. However, real-world temporal graphs often possess rich textual information, giving rise to temporal text-attributed graphs (TTAGs). Such combination of dynamic text semantics and evolving graph structures introduces heightened complexity. Existing TGNNs embed texts statically and rely heavily on encoding mechanisms that biasedly prioritize structural information, overlooking the temporal evolution of text semantics and the essential interplay between semantics and structures for synergistic reinforcement. To tackle these issues, we present extbf{{Cross}}, a novel framework that seamlessly extends existing TGNNs for TTAG modeling. The key idea is to employ the advanced large language models (LLMs) to extract the dynamic semantics in text space and then generate expressive representations unifying both semantics and structures. Specifically, we propose a Temporal Semantics Extractor in the {Cross} framework, which empowers the LLM to offer the temporal semantic understanding of node's evolving contexts of textual neighborhoods, facilitating semantic dynamics. Subsequently, we introduce the Semantic-structural Co-encoder, which collaborates with the above Extractor for synthesizing illuminating representations by jointly considering both semantic and structural information while encouraging their mutual reinforcement. Extensive experimental results on four public datasets and one practical industrial dataset demonstrate {Cross}'s significant effectiveness and robustness.