CelerLog: Fast Log Parsing via Dynamic Routing

📅 2026-05-25

📈 Citations: 0

✨ Influential: 0

career value

185K/year

🤖 AI Summary

Existing log parsing approaches struggle to balance semantic understanding and computational efficiency: purely statistical methods lack semantic awareness, while full reliance on large language models (LLMs) incurs high latency and cost. This work proposes a dynamic routing mechanism that adaptively categorizes incoming logs into dense and sparse types, processing them respectively with efficient statistical pattern mining and lightweight LLM-based semantic reasoning. By minimizing unnecessary LLM invocations, the method maintains high parsing accuracy across 14 public datasets while achieving 7.9–18.6× speedup over pure LLM approaches and up to 1.5× faster execution than Drain. Furthermore, it reduces token consumption by 80.2%–94.1% and LLM call frequency by 86.4%–90.9%.

📝 Abstract

Log parsing is a fundamental step for automated log analysis, which transforms raw log messages into structured formats. Existing syntax-based parsers struggle with complex logs because they lack semantic reasoning ability. Emerging LLM-powered semantic parsers achieve high accuracy but suffer from prohibitive latency and token costs because they apply semantic inference across all logs. Our key observation is that not all logs necessitate complex semantic understanding: a vast majority of logs exhibit repetitive patterns that can be extracted via straightforward statistical analysis. Driven by this insight, we propose CelerLog, a fast and effective log parser. CelerLog introduces a dynamic routing mechanism to classify logs into dense and sparse groups. Logs with strong statistical patterns (dense groups) are processed by an efficient statistical processor, whereas the sparse groups lacking such patterns are routed to an LLM for semantic inference. This hybrid strategy avoids unnecessary LLM invocations. Extensive experiments on 14 public datasets show that CelerLog achieves leading performance over state-of-the-art baselines and is 7.9x to 18.6x faster than LLM methods and up to 1.5x faster than Drain. Additionally, it reduces costs by decreasing token consumption by 80.2% - 94.1% and LLM invocations by 86.4% - 90.9%.

Problem

Research questions and friction points this paper is trying to address.

log parsing

semantic reasoning

latency

token cost

structured logging

Innovation

Methods, ideas, or system contributions that make the work stand out.

dynamic routing

log parsing

large language model