🤖 AI Summary
This work addresses the critical yet underexplored problem of unsupervised log template mining from security incident logs—specifically, leveraging large language models (LLMs) in a zero-shot, fully unsupervised setting without labeled data or manual rules. We propose a lightweight fine-tuning framework that integrates semantic clustering, dynamic template abstraction, and log-structure priors to guide template extraction. Our method avoids reliance on handcrafted heuristics or supervised signals while preserving interpretability and efficiency. Evaluated across multiple real-world security log datasets, it achieves 92.1% template accuracy—outperforming state-of-the-art unsupervised baselines by an average of 11.3%. Moreover, it significantly improves downstream tasks, including alert compression and anomaly detection. By transcending the limitations of conventional clustering- and regex-based approaches, this work establishes a reproducible, generalizable, LLM-driven unsupervised paradigm for log understanding.
📝 Abstract
In modern IT systems and computer networks, real-time and offline event log analysis is a crucial part of cyber security monitoring. In particular, event log analysis techniques are essential for the timely detection of cyber attacks and for assisting security experts with the analysis of past security incidents. The detection of line patterns or templates from unstructured textual event logs has been identified as an important task of event log analysis since detected templates represent event types in the event log and prepare the logs for downstream online or offline security monitoring tasks. During the last two decades, a number of template mining algorithms have been proposed. However, many proposed algorithms rely on traditional data mining techniques, and the usage of Large Language Models (LLMs) has received less attention so far. Also, most approaches that harness LLMs are supervised, and unsupervised LLM-based template mining remains an understudied area. The current paper addresses this research gap and investigates the application of LLMs for unsupervised detection of templates from unstructured security event logs.