RPG-AE: Neuro-Symbolic Graph Autoencoders with Rare Pattern Mining for Provenance-Based Anomaly Detection

📅 2026-02-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of detecting advanced persistent threats (APTs), which exhibit high stealth and low observability in system provenance data. To this end, the authors propose a neuro-symbolic framework that integrates graph autoencoders with rare pattern mining. The approach constructs a process behavior graph via k-nearest neighbors, employs a graph autoencoder to learn the structural characteristics of normal execution patterns, and enhances anomaly scoring by incorporating rare co-occurrence patterns. Experimental evaluation on the DARPA Transparent Computing dataset demonstrates that this single-model method significantly outperforms baseline graph autoencoders, achieving detection performance comparable to ensemble-based multi-detector systems while substantially improving both the quality of anomaly ranking and result interpretability.

Technology Category

Application Category

📝 Abstract
Advanced Persistent Threats (APTs) are sophisticated, long-term cyberattacks that are difficult to detect because they operate stealthily and often blend into normal system behavior. This paper presents a neuro-symbolic anomaly detection framework that combines a Graph Autoencoder (GAE) with rare pattern mining to identify APT-like activities in system-level provenance data. Our approach first constructs a process behavioral graph using k-Nearest Neighbors based on feature similarity, then learns normal relational structure using a Graph Autoencoder. Anomaly candidates are identified through deviations between observed and reconstructed graph structure. To further improve detection, we integrate an rare pattern mining module that discovers infrequent behavioral co-occurrences and uses them to boost anomaly scores for processes exhibiting rare signatures. We evaluate the proposed method on the DARPA Transparent Computing datasets and show that rare-pattern boosting yields substantial gains in anomaly ranking quality over the baseline GAE. Compared with existing unsupervised approaches on the same benchmark, our single unified model consistently outperforms individual context-based detectors and achieves performance competitive with ensemble aggregation methods that require multiple separate detectors. These results highlight the value of coupling graph-based representation learning with classical pattern mining to improve both effectiveness and interpretability in provenance-based security anomaly detection.
Problem

Research questions and friction points this paper is trying to address.

Advanced Persistent Threats
provenance-based anomaly detection
rare pattern mining
Graph Autoencoder
cybersecurity
Innovation

Methods, ideas, or system contributions that make the work stand out.

Graph Autoencoder
Rare Pattern Mining
Neuro-Symbolic
Provenance-Based Anomaly Detection
APT Detection
🔎 Similar Papers
2023-06-02Web Search and Data MiningCitations: 27
A
Asif Tauhid
New York University, NYUAD, Division of Science
S
Sidahmed Benabderrahmane
New York University, NYUAD, Division of Science
M
Mohamad Altrabulsi
New York University, NYUAD, Division of Science
A
Ahamed Foisal
New York University, NYUAD, Division of Science
Talal Rahwan
Talal Rahwan
Associate Professor of Computer Science, New York University Abu Dhabi
Artificial IntelligenceComputational Social ScienceGame Theory