KnowHow: Automatically Applying High-Level CTI Knowledge for Interpretable and Accurate Provenance Analysis

📅 2025-09-06

📈 Citations: 0

✨ Influential: 0

career value

166K/year

🤖 AI Summary

To address the semantic gap between high-level cyber threat intelligence (CTI) and low-level system logs—hindering effective APT detection—this paper proposes gIoC, a novel attack knowledge representation framework that enables automated, interpretable mapping from natural-language CTI reports (e.g., ATT&CK descriptions) to system provenance events. The method integrates semantic enhancement of system identifiers, fine-grained natural language matching, graph-based indicator-of-compromise (gIoC) alignment, and temporal-logic-driven attack chain reasoning. Evaluated on both open-source and industrial datasets, gIoC fully detects 16 real-world APT campaigns, reduces node-level false positives by up to 90%, significantly improves recall, and demonstrates strong robustness against zero-day and mimicry attacks. Its core contribution is an end-to-end, interpretable, and logically grounded pipeline for operationalizing CTI into actionable detection logic.

Technology Category

Application Category

📝 Abstract

High-level natural language knowledge in CTI reports, such as the ATT&CK framework, is beneficial to counter APT attacks. However, how to automatically apply the high-level knowledge in CTI reports in realistic attack detection systems, such as provenance analysis systems, is still an open problem. The challenge stems from the semantic gap between the knowledge and the low-level security logs: while the knowledge in CTI reports is written in natural language, attack detection systems can only process low-level system events like file accesses or network IP manipulations. Manual approaches can be labor-intensive and error-prone. In this paper, we propose KnowHow, a CTI-knowledge-driven online provenance analysis approach that can automatically apply high-level attack knowledge from CTI reports written in natural languages to detect low-level system events. The core of KnowHow is a novel attack knowledge representation, gIoC, that represents the subject, object, and actions of attacks. By lifting system identifiers, such as file paths, in system events to natural language terms, KnowHow can match system events to gIoC and further match them to techniques described in natural languages. Finally, based on the techniques matched to system events, KnowHow reasons about the temporal logic of attack steps and detects potential APT attacks in system events. Our evaluation shows that KnowHow can accurately detect all 16 APT campaigns in the open-source and industrial datasets, while existing approaches all introduce large numbers of false positives. Meanwhile, our evaluation also shows that KnowHow reduces at most 90% of node-level false positives while having a higher node-level recall and is robust against several unknown attacks and mimicry attacks.

Problem

Research questions and friction points this paper is trying to address.

Bridging semantic gap between high-level CTI knowledge and low-level system logs

Automating application of natural language attack descriptions to provenance analysis

Reducing false positives in APT attack detection through contextual event matching

Innovation

Methods, ideas, or system contributions that make the work stand out.

Automatically applies high-level CTI knowledge to provenance analysis

Uses novel gIoC representation for attack subject-object-actions

Lifts system identifiers to natural language for technique matching

🔎 Similar Papers

No similar papers found.