🤖 AI Summary
This work addresses the lack of standardized benchmarks and realistic anomalous text generation methods for anomaly detection in Text-Attributed Graphs (TAGs). To this end, we introduce TAG-AD—the first dedicated benchmark—leveraging Large Language Models (LLMs) to design a Retrieval-Augmented Generation (RAG) pipeline that automatically synthesizes semantically plausible yet contextually inconsistent anomalous texts, covering diverse anomaly types. We further propose a zero-shot anomaly detection framework that decouples global semantic knowledge modeling from local graph structural modeling, eliminating reliance on handcrafted prompts. Experiments demonstrate that LLMs excel at identifying contextual anomalies, whereas Graph Neural Networks (GNNs) are more effective for structural anomalies. Moreover, RAG-augmented prompting achieves performance comparable to human-designed prompts, substantially enhancing zero-shot generalization and practical applicability.
📝 Abstract
Anomaly detection on attributed graphs plays an essential role in applications such as fraud detection, intrusion monitoring, and misinformation analysis. However, text-attributed graphs (TAGs), in which node information is expressed in natural language, remain underexplored, largely due to the absence of standardized benchmark datasets. In this work, we introduce TAG-AD, a comprehensive benchmark for anomaly node detection on TAGs. TAG-AD leverages large language models (LLMs) to generate realistic anomalous node texts directly in the raw text space, producing anomalies that are semantically coherent yet contextually inconsistent and thus more reflective of real-world irregularities. In addition, TAG-AD incorporates multiple other anomaly types, enabling thorough and reproducible evaluation of graph anomaly detection (GAD) methods. With these datasets, we further benchmark existing unsupervised GNN-based GAD methods as well as zero-shot LLMs for GAD.
As part of our zero-shot detection setup, we propose a retrieval-augmented generation (RAG)-assisted, LLM-based zero-shot anomaly detection framework. The framework mitigates reliance on brittle, hand-crafted prompts by constructing a global anomaly knowledge base and distilling it into reusable analysis frameworks. Our experimental results reveal a clear division of strengths: LLMs are particularly effective at detecting contextual anomalies, whereas GNN-based methods remain superior for structural anomaly detection. Moreover, RAG-assisted prompting achieves performance comparable to human-designed prompts while eliminating manual prompt engineering, underscoring the practical value of our RAG-assisted zero-shot LLM anomaly detection framework.