🤖 AI Summary
This work addresses the challenge of delayed updates in existing security knowledge bases, which hinder timely responses to emerging cyber threats. To bridge this gap, the authors propose SynAT, a novel end-to-end method that automatically constructs attack trees from unstructured, crowdsourced security discussions. SynAT first employs a large language model with prompt-based learning to identify sentences containing attack-related information, then leverages a transition-based joint event and relation extraction model to capture key elements, and finally applies heuristic rules to synthesize attack trees. Evaluated on 5,070 Stack Overflow posts, SynAT outperforms baseline approaches in both event/relation extraction and attack tree similarity. The method has been successfully applied to enhance Huawei’s internal knowledge base as well as public repositories such as CVE and CAPEC, effectively reducing the timeliness gap between community-driven insights and formal security knowledge bases.
📝 Abstract
Cyber attacks have become a serious threat to the security of software systems. Many organizations have built their security knowledge bases to safeguard against attacks and vulnerabilities. However, due to the time lag in the official release of security information, these security knowledge bases may not be well maintained, and using them to protect software systems against emergent security risks can be challenging. On the other hand, the security posts on online knowledge-sharing platforms contain many crowd security discussions and the knowledge in those posts can be used to enhance the security knowledge bases. This paper proposes SynAT, an automatic approach to synthesize attack trees from crowd security posts. Given a security post, SynAT first utilize the Large Language Model (LLM) and prompt learning to restrict the scope of sentences that may contain attack information; then it utilizes a transition-based event and relation extraction model to extract the events and relations simultaneously from the scope; finally, it applies heuristic rules to synthesize the attack trees with the extracted events and relations. An experimental evaluation is conducted on 5,070 Stack Overflow security posts, and the results show that SynAT outperforms all baselines in both event and relation extraction, and achieves the highest tree similarity in attack tree synthesis. Furthermore, SynAT has been applied to enhance HUAWEI's security knowledge base as well as public security knowledge bases CVE and CAPEC, which demonstrates SynAT's practicality.