๐ค AI Summary
This work addresses the challenges of dynamic graph modeling and extreme class imbalance in inductive graph anomaly detection by proposing a novel data-centric framework. The approach uniquely integrates a discrete autoregressive graph diffusion model with a curriculum-based anomaly augmentation mechanism: the former generates local subgraphs that faithfully reflect the structural distribution of real anomalies, while the latter dynamically emphasizes underrepresented anomaly patterns during training, enabling adaptive and balanced data generation. By decoupling the inherent tension between model staticity and data imbalance, the framework significantly enhances the detection performance and generalization capability for unseen anomalous nodes across five benchmark datasets.
๐ Abstract
Graph anomaly detection (GAD) is crucial in applications like fraud detection and cybersecurity. Despite recent advancements using graph neural networks (GNNs), two major challenges persist. At the model level, most methods adopt a transductive learning paradigm, which assumes static graph structures, making them unsuitable for dynamic, evolving networks. At the data level, the extreme class imbalance, where anomalous nodes are rare, leads to biased models that fail to generalize to unseen anomalies. These challenges are interdependent: static transductive frameworks limit effective data augmentation, while imbalance exacerbates model distortion in inductive learning settings. To address these challenges, we propose a novel data-centric framework that integrates dynamic graph modeling with balanced anomaly synthesis. Our framework features: (1) a discrete ego-graph diffusion model, which captures the local topology of anomalies to generate ego-graphs aligned with anomalous structural distribution, and (2) a curriculum anomaly augmentation mechanism, which dynamically adjusts synthetic data generation during training, focusing on underrepresented anomaly patterns to improve detection and generalization. Experiments on five datasets demonstrate that the effectiveness of our framework.