🤖 AI Summary
This work addresses the limitation of existing graph anomaly detection methods, which often neglect the structural context of nodes when incorporating textual features, thereby failing to capture complex anomalies arising from inconsistencies between node content and topological roles. To overcome this, the authors propose TERGAD, a novel framework that leverages large language models (LLMs) to translate node topological properties into natural language descriptions, generating high-level structural semantic embeddings. These embeddings are adaptively fused with original node features via a gated dual-branch autoencoder, enabling joint reconstruction of both graph structure and attributes to compute a comprehensive anomaly score. By explicitly modeling and effectively integrating graph structural semantics, TERGAD achieves significant performance gains over state-of-the-art methods across six real-world datasets, with ablation studies confirming the critical contributions of structural semantic guidance and the gating mechanism.
📝 Abstract
Graph Anomaly Detection (GAD) aims to identify atypical graph entities, such as nodes, edges, or substructures, that deviate significantly from the majority. While existing text-rich approaches typically integrate structural context into the data representation pipeline using raw textual features, they often neglect the structural context of nodes. This limitation hinders their ability to detect sophisticated anomalies arising from inconsistencies between a node's inherent content and its topological role. To bridge this gap, we propose TERGAD (Structure-aware Text-enhanced Representations for Graph Anomaly Detection), A novel data augmentation framework that enriches structural semantics for GAD via the semantic reasoning capabilities of Large Language Models (LLMs). Specifically, TERGAD translates node-level topological properties into descriptive natural language narratives, which are subsequently processed by an LLM to derive high-level semantic embeddings. These embeddings are then adaptively fused with original node attributes through a gated dual-branch autoencoder to jointly reconstruct both graph structure and node features. The anomaly score is computed based on the integrated reconstruction error, effectively capturing deviations in both observable attributes and LLM-informed semantic expectations. Extensive experiments on six real-world datasets demonstrate that TERGAD consistently outperforms state-of-the-art baselines. Furthermore, our ablation studies validate the indispensable role of structural semantic guidance and the efficacy of the gated fusion mechanism. Code is available at https://github.com/Kantorakitty/TERGAD-main.