π€ AI Summary
This work addresses the challenges posed by advanced persistent threat (APT) attacks, which are characterized by high stealthiness, multi-stage evolution, scarcity of labeled samples, and high annotation costs, rendering conventional point-based defenses ineffective at capturing long-range semantic dependencies across entities. To overcome these limitations, we propose a provenance graphβbased multi-view co-learning framework that enables node-level APT behavior identification under unsupervised or weakly supervised settings through multi-view feature extraction and anomaly detection. The approach leverages a co-training mechanism to enhance model generalization against diverse and previously unseen attack tactics and techniques. Experimental evaluation on three real-world APT datasets demonstrates that the proposed method significantly improves cross-scenario detection performance and practical deployability.
π Abstract
Advanced persistent threats (APTs) are stealthy and multi-stage, making single-point defenses (e.g., malware- or traffic-based detectors) ill-suited to capture long-range and cross-entity attack semantics. Provenance-graph analysis has become a prominent approach for APT detection. However, its practical deployment is hampered by (i) the scarcity of APT samples, (ii) the cost and difficulty of fine-grained APT sample labeling, and (iii) the diversity of attack tactics and techniques. Aiming at these problems, this paper proposes APT-MCL, an intelligent APT detection system based on Multi-view Collaborative provenance graph Learning. It adopts an unsupervised learning strategy to discover APT attacks at the node level via anomaly detection. After that, it creates multiple anomaly detection sub-models based on multi-view features and integrates them within a collaborative learning framework to adapt to diverse attack scenarios. Extensive experiments on three real-world APT datasets validate the approach: (i) multi-view features improve cross-scenario generalization, and (ii) co-training substantially boosts node-level detection under label scarcity, enabling practical deployment on diverse attack scenarios.