RPG-AE: Neuro-Symbolic Graph Autoencoders with Rare Pattern Mining for Provenance-Based Anomaly Detection

📅 2026-02-03

📈 Citations: 0

✨ Influential: 0

career value

169K/year

🤖 AI Summary

This work addresses the challenge of detecting advanced persistent threats (APTs), which exhibit high stealth and low observability in system provenance data. To this end, the authors propose a neuro-symbolic framework that integrates graph autoencoders with rare pattern mining. The approach constructs a process behavior graph via k-nearest neighbors, employs a graph autoencoder to learn the structural characteristics of normal execution patterns, and enhances anomaly scoring by incorporating rare co-occurrence patterns. Experimental evaluation on the DARPA Transparent Computing dataset demonstrates that this single-model method significantly outperforms baseline graph autoencoders, achieving detection performance comparable to ensemble-based multi-detector systems while substantially improving both the quality of anomaly ranking and result interpretability.

Technology Category

Application Category

📝 Abstract

Advanced Persistent Threats (APTs) are sophisticated, long-term cyberattacks that are difficult to detect because they operate stealthily and often blend into normal system behavior. This paper presents a neuro-symbolic anomaly detection framework that combines a Graph Autoencoder (GAE) with rare pattern mining to identify APT-like activities in system-level provenance data. Our approach first constructs a process behavioral graph using k-Nearest Neighbors based on feature similarity, then learns normal relational structure using a Graph Autoencoder. Anomaly candidates are identified through deviations between observed and reconstructed graph structure. To further improve detection, we integrate an rare pattern mining module that discovers infrequent behavioral co-occurrences and uses them to boost anomaly scores for processes exhibiting rare signatures. We evaluate the proposed method on the DARPA Transparent Computing datasets and show that rare-pattern boosting yields substantial gains in anomaly ranking quality over the baseline GAE. Compared with existing unsupervised approaches on the same benchmark, our single unified model consistently outperforms individual context-based detectors and achieves performance competitive with ensemble aggregation methods that require multiple separate detectors. These results highlight the value of coupling graph-based representation learning with classical pattern mining to improve both effectiveness and interpretability in provenance-based security anomaly detection.

Problem

Research questions and friction points this paper is trying to address.

Advanced Persistent Threats

provenance-based anomaly detection

rare pattern mining

Graph Autoencoder

cybersecurity

Innovation

Methods, ideas, or system contributions that make the work stand out.

Graph Autoencoder

Rare Pattern Mining

Neuro-Symbolic