Unifying Unsupervised Graph-Level Anomaly Detection and Out-of-Distribution Detection: A Benchmark

📅 2024-06-21
🏛️ arXiv.org
📈 Citations: 6
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses the long-standing fragmentation between graph-level unsupervised anomaly detection (GLAD) and graph-level out-of-distribution detection (GLOD), along with inconsistent evaluation protocols. We introduce UB-GOLD—the first unified benchmark—comprising 35 datasets across four realistic application scenarios and enabling systematic evaluation of 18 methods. UB-GOLD unifies task definitions and evaluation paradigms for GLAD and GLOD, and establishes a multidimensional analytical framework assessing OOD sensitivity, robustness, efficiency, and more. Leveraging unsupervised graph representation learning, reconstruction error modeling, and statistical outlier scoring, it supports cross-scenario generalization assessment. Experiments reveal that most existing GLAD methods exhibit severely limited generalization to GLOD tasks. To foster reproducibility and advancement, we release an open-source, standardized codebase and evaluation protocol—accelerating the development of secure and robust graph learning systems.

Technology Category

Application Category

📝 Abstract
To build safe and reliable graph machine learning systems, unsupervised graph-level anomaly detection (GLAD) and unsupervised graph-level out-of-distribution (OOD) detection (GLOD) have received significant attention in recent years. Though those two lines of research indeed share the same objective, they have been studied independently in the community due to distinct evaluation setups, creating a gap that hinders the application and evaluation of methods from one to the other. To bridge the gap, in this work, we present a underline{ extbf{U}}nified underline{ extbf{B}}enchmark for unsupervised underline{ extbf{G}}raph-level underline{ extbf{O}}OD and anomaunderline{ extbf{L}}y underline{ extbf{D}}etection (ourmethod), a comprehensive evaluation framework that unifies GLAD and GLOD under the concept of generalized graph-level OOD detection. Our benchmark encompasses 35 datasets spanning four practical anomaly and OOD detection scenarios, facilitating the comparison of 18 representative GLAD/GLOD methods. We conduct multi-dimensional analyses to explore the effectiveness, OOD sensitivity spectrum, robustness, and efficiency of existing methods, shedding light on their strengths and limitations. Furthermore, we provide an open-source codebase (https://github.com/UB-GOLD/UB-GOLD) of ourmethod to foster reproducible research and outline potential directions for future investigations based on our insights.
Problem

Research questions and friction points this paper is trying to address.

Unifying graph-level anomaly and OOD detection methods
Bridging evaluation gap between GLAD and GLOD research
Providing benchmark for generalized graph-level OOD detection
Innovation

Methods, ideas, or system contributions that make the work stand out.

Unified benchmark for graph-level anomaly detection
Comprehensive evaluation framework for GLAD and GLOD
Open-source codebase for reproducible research
🔎 Similar Papers