LLM4Log: A Systematic Review of Large Language Model-based Log Analysis

📅 2026-03-18

📈 Citations: 0

✨ Influential: 0

career value

182K/year

🤖 AI Summary

This work addresses the challenges of efficiently analyzing large-scale, dynamically evolving semi-structured logs under conditions of label scarcity and distribution shift, which hinder system reliability and AIOps advancement. It presents the first unified task taxonomy for log analysis driven by large language models (LLMs), offering a systematic survey of their application across the full log analysis pipeline—including log generation, parsing, anomaly detection, and root cause analysis. Through structured analysis of 145 studies, the paper identifies five core design paradigms: prompt engineering, retrieval augmentation, fine-tuning, agent collaboration, and result verification. It further synthesizes the state of research, datasets, and evaluation practices across seven key tasks, while highlighting critical challenges in robustness, trustworthiness, and reproducibility, thereby providing a comprehensive roadmap for reliable LLM-based log intelligence.

📝 Abstract

Software systems generate massive, evolving, semi-structured logs that are central to reliability engineering and AIOps, yet difficult to analyze at scale under drift and limited labels. Recent advances in pretrained Transformer models and instruction-tuned large language models (LLMs) have reshaped log analysis by enabling semantic generalization and cross-source evidence integration, but also introducing deployment risks such as context limits, latency/cost, privacy constraints, and hallucinations. This paper presents LLM4Log, a systematic review of LLM-based log analysis across the end-to-end pipeline, from upstream logging-statement generation and maintenance to log parsing/structuring and downstream tasks including anomaly detection, failure prediction, root cause analysis, and log summarization. Following a structured search and manual screening protocol, we completed literature collection in November 2025 and identified 145 unique papers across seven logging tasks. We synthesize the research area through a unified, task-driven taxonomy, summarize common design patterns (prompting/ICL, retrieval grounding, fine-tuning, tool/agent augmentation, and verification), and analyze evaluation practices, datasets, metrics, and reproducibility. Based on these cross-paper analyses, we distill key lessons and open challenges for reliable real-world adoption. We emphasize robustness under drift and long-tail events, grounding and faithfulness for operator-facing outputs, and deployment-oriented designs with verifiable behavior.

Problem

Research questions and friction points this paper is trying to address.

log analysis

large language models

software reliability

AIOps

data drift

Innovation

Methods, ideas, or system contributions that make the work stand out.

Large Language Models

Log Analysis

Systematic Review