🤖 AI Summary
This work addresses the challenge of detecting advanced persistent threats (APTs), which exhibit high stealth and low observability in system provenance data. To this end, the authors propose a neuro-symbolic framework that integrates graph autoencoders with rare pattern mining. The approach constructs a process behavior graph via k-nearest neighbors, employs a graph autoencoder to learn the structural characteristics of normal execution patterns, and enhances anomaly scoring by incorporating rare co-occurrence patterns. Experimental evaluation on the DARPA Transparent Computing dataset demonstrates that this single-model method significantly outperforms baseline graph autoencoders, achieving detection performance comparable to ensemble-based multi-detector systems while substantially improving both the quality of anomaly ranking and result interpretability.
📝 Abstract
Advanced Persistent Threats (APTs) are sophisticated, long-term cyberattacks that are difficult to detect because they operate stealthily and often blend into normal system behavior. This paper presents a neuro-symbolic anomaly detection framework that combines a Graph Autoencoder (GAE) with rare pattern mining to identify APT-like activities in system-level provenance data. Our approach first constructs a process behavioral graph using k-Nearest Neighbors based on feature similarity, then learns normal relational structure using a Graph Autoencoder. Anomaly candidates are identified through deviations between observed and reconstructed graph structure. To further improve detection, we integrate an rare pattern mining module that discovers infrequent behavioral co-occurrences and uses them to boost anomaly scores for processes exhibiting rare signatures. We evaluate the proposed method on the DARPA Transparent Computing datasets and show that rare-pattern boosting yields substantial gains in anomaly ranking quality over the baseline GAE. Compared with existing unsupervised approaches on the same benchmark, our single unified model consistently outperforms individual context-based detectors and achieves performance competitive with ensemble aggregation methods that require multiple separate detectors. These results highlight the value of coupling graph-based representation learning with classical pattern mining to improve both effectiveness and interpretability in provenance-based security anomaly detection.