APT-CGLP: Advanced Persistent Threat Hunting via Contrastive Graph-Language Pre-Training

📅 2025-11-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the challenges of cross-modal heterogeneity between provenance graphs and Cyber Threat Intelligence (CTI) reports, information loss due to graph extraction, and heavy reliance on manual annotation, this paper proposes an end-to-end cross-modal semantic matching framework. Our method innovatively integrates large language model–driven synthesis of high-quality graph–text pairs, multi-objective contrastive graph–language pretraining, and cross-modal masked modeling to achieve attack semantic alignment at both coarse- and fine-grained levels. Crucially, it eliminates explicit graph structure extraction, thereby avoiding information loss and reducing human intervention. Extensive experiments on four real-world APT datasets demonstrate that our approach significantly outperforms state-of-the-art threat hunting baselines in both detection accuracy and inference efficiency.

Technology Category

Application Category

📝 Abstract
Provenance-based threat hunting identifies Advanced Persistent Threats (APTs) on endpoints by correlating attack patterns described in Cyber Threat Intelligence (CTI) with provenance graphs derived from system audit logs. A fundamental challenge in this paradigm lies in the modality gap--the structural and semantic disconnect between provenance graphs and CTI reports. Prior work addresses this by framing threat hunting as a graph matching task: 1) extracting attack graphs from CTI reports, and 2) aligning them with provenance graphs. However, this pipeline incurs severe extit{information loss} during graph extraction and demands intensive manual curation, undermining scalability and effectiveness. In this paper, we present APT-CGLP, a novel cross-modal APT hunting system via Contrastive Graph-Language Pre-training, facilitating end-to-end semantic matching between provenance graphs and CTI reports without human intervention. First, empowered by the Large Language Model (LLM), APT-CGLP mitigates data scarcity by synthesizing high-fidelity provenance graph-CTI report pairs, while simultaneously distilling actionable insights from noisy web-sourced CTIs to improve their operational utility. Second, APT-CGLP incorporates a tailored multi-objective training algorithm that synergizes contrastive learning with inter-modal masked modeling, promoting cross-modal attack semantic alignment at both coarse- and fine-grained levels. Extensive experiments on four real-world APT datasets demonstrate that APT-CGLP consistently outperforms state-of-the-art threat hunting baselines in terms of accuracy and efficiency.
Problem

Research questions and friction points this paper is trying to address.

Bridging the modality gap between provenance graphs and CTI reports for threat hunting
Eliminating information loss and manual curation in traditional graph matching approaches
Enabling end-to-end semantic matching between attack patterns and system logs
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses LLM to synthesize graph-report training pairs
Combines contrastive learning with masked modeling
Enables end-to-end semantic matching without human intervention
🔎 Similar Papers
No similar papers found.
X
Xuebo Qiu
College of Computer Science and Technology, Zhejiang University of Technology, Hangzhou, China
M
Mingqi Lv
College of Geoinformatics, Zhejiang University of Technology, Zhejiang Key Laboratory of Visual Information Intelligent Processing, Hangzhou, China
Y
Yimei Zhang
College of Computer Science and Technology, Zhejiang University of Technology, Hangzhou, China
T
Tieming Chen
College of Geoinformatics, Zhejiang University of Technology, Zhejiang Key Laboratory of Visual Information Intelligent Processing, Hangzhou, China
Tiantian Zhu
Tiantian Zhu
Zhejiang University of Technology
Mobile SecuritySystem SecurityArtificial Intelligence
Q
Qijie Song
College of Geoinformatics, Zhejiang University of Technology, Hangzhou, China
Shouling Ji
Shouling Ji
Professor, Zhejiang University & Georgia Institute of Technology
Data-driven SecurityAI SecuritySoftware ScurityPrivacy