SoK: LLM-based Log Parsing

📅 2025-04-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Traditional log parsing methods suffer from heavy reliance on handcrafted rules, poor generalizability, and limited scalability. To address these limitations, this paper presents the first systematic, structured survey of 29 large language model (LLM)-driven log parsing approaches. We propose a unified, general-purpose LLM-based log parsing pipeline encompassing prompt engineering, supervised/unsupervised fine-tuning, and efficiency optimization. A multidimensional evaluation framework is established to assess method characteristics, prompt engineering efficacy, and computational efficiency. We conduct reproducible benchmarking across seven open-source LLM-based parsers and widely adopted public datasets, releasing all source code, standardized evaluation results, and comprehensive feature comparison tables. This work advances standardization, transparency, and reproducibility in LLM-based log parsing research, providing both a methodological foundation and practical guidelines for the community.

Technology Category

Application Category

📝 Abstract
Log data, generated by software systems, provides crucial insights for tasks like monitoring, root cause analysis, and anomaly detection. Due to the vast volume of logs, automated log parsing is essential to transform semi-structured log messages into structured representations. Traditional log parsing techniques often require manual configurations, such as defining log formats or labeling data, which limits scalability and usability. Recent advances in large language models (LLMs) have introduced the new research field of LLM-based log parsing, offering potential improvements in automation and adaptability. Despite promising results, there is no structured overview of these approaches since this is a relatively new research field with the earliest advances published in late 2023. This paper systematically reviews 29 LLM-based log parsing methods, comparing their capabilities, limitations, and reliance on manual effort. We analyze the learning and prompt-engineering paradigms employed, efficiency- and effectiveness-enhancing techniques, and the role of LLMs in the parsing process. We aggregate the results of the survey in a large table comprising the characterizing features of LLM-based log parsing approaches and derive the general process of LLM-based log parsing, incorporating all reviewed approaches in a single flow chart. Additionally, we benchmark seven open-source LLM-based log parsers on public datasets and critically assess their reproducibility. Our findings summarize the advances of this new research field and provide insights for researchers and practitioners seeking efficient and user-friendly log parsing solutions, with all code and results made publicly available for transparency.
Problem

Research questions and friction points this paper is trying to address.

Automating log parsing to handle vast semi-structured data
Overcoming manual configuration limits in traditional log parsing
Systematically reviewing LLM-based log parsing methods and benchmarks
Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM-based log parsing automation
Systematic review of 29 methods
Benchmarking seven open-source parsers