How to use Graph Data in the Wild to Help Graph Anomaly Detection?

📅 2025-06-04

🏛️ Knowledge Discovery and Data Mining

📈 Citations: 0

✨ Influential: 0

career value

203K/year

🤖 AI Summary

Addressing key challenges in graph anomaly detection—including label scarcity, ambiguous anomaly definitions, and difficulty in modeling normal distribution—this paper proposes Wild-GAD, the first framework to enable cross-domain knowledge transfer using large-scale, heterogeneous “in-the-wild” graph data. Methodologically, it introduces (i) a unified graph database (UniWildGraph) and a shared feature space; (ii) an external graph selection criterion balancing representativeness and diversity; and (iii) an unsupervised transfer learning detection paradigm. Evaluated on six real-world datasets, Wild-GAD achieves average improvements of +18% in AUC-ROC and +32% in AUC-PR over state-of-the-art methods. This work establishes a scalable, annotation-free general enhancement paradigm for low-resource graph anomaly detection.

Technology Category

Application Category

📝 Abstract

In recent years, graph anomaly detection has gained considerable attention and has found extensive applications in various domains such as social, financial, and communication networks. However, anomalies in graph-structured data present unique challenges, including label scarcity, ill-defined anomalies, and varying anomaly types, making supervised or semi-supervised methods unreliable. Researchers often adopt unsupervised approaches to address these challenges, assuming that anomalies deviate significantly from the normal data distribution. Yet, when the available data is insufficient, capturing the normal distribution accurately and comprehensively becomes difficult. To overcome this limitation, we propose to utilize external graph data (i.e., graph data in the wild) to help anomaly detection tasks. This naturally raises the question: How can we use external data to help graph anomaly detection task? To answer this question, we propose a novel framework Wild-GAD. Our framework is built upon a unified database, UniWildGraph, which comprises a large and diverse collection of graph data with broad domain coverage, ample data volume, and a unified feature space. We further develop selection criteria based on representativity and diversity to identify the most suitable external data for each anomaly detection task. Extensive experiments on six real-world test datasets demonstrate the effectiveness of Wild-GAD. Compared to the baseline methods, our framework has an average 18% AUCROC and 32% AUCPR improvement over the best-competing methods.

Problem

Research questions and friction points this paper is trying to address.

Addressing label scarcity and ill-defined anomalies in graph data

Utilizing external graph data to improve anomaly detection

Developing criteria for selecting optimal external graph datasets

Innovation

Methods, ideas, or system contributions that make the work stand out.

Utilizes external graph data for anomaly detection

Introduces Wild-GAD framework with UniWildGraph database

Employs representativity and diversity selection criteria

🔎 Similar Papers

Graph Anomaly Detection in Time Series: A Survey