InferLog: Accelerating LLM Inference for Online Log Parsing via ICL-oriented Prefix Caching

📅 2025-07-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address high inference latency, low throughput, and privacy sensitivity in online log parsing, this work proposes a prefix-aware in-context learning (ICL) optimization framework coupled with a meta-learning-driven dynamic configuration tuning method. It is the first to systematically apply LLM inference optimization to real-time log parsing: prefix caching improves KV cache reuse, while task-adaptive ICL example selection and vLLM scheduling collectively reduce per-inference overhead; meta-learning enables rapid identification of optimal configurations across diverse log formats. Experiments on the LogHub benchmark demonstrate that our approach achieves 2.8× average speedup and 3.1× higher throughput over state-of-the-art LLM-based baselines, while maintaining SOTA parsing accuracy (F1 ≥ 0.97). The method thus simultaneously advances efficiency, accuracy, and on-device deployment security.

Technology Category

Application Category

📝 Abstract
Modern software systems generate massive volumes of runtime logs, necessitating efficient and accurate log parsing to enable critical downstream tasks such as anomaly detection and root cause analysis. Recently, large language models (LLMs) have achieved advanced accuracy on log parsing, but their deployment in production environments faces two major limitations: (1) the privacy risks associated with commercial LLMs, driving the adoption of local deployment, and (2) the stringent latency and throughput requirements imposed by high-volume log streams, which existing LLM-based parsers fail to meet. Although recent efforts have reduced the number of LLM queries, they overlook the high latency of the LLM invocations, where concurrent log parsing requests can cause serve performance degradation of LLM inference system. In this study, we present InferLog, the first LLM inference optimization method for online log parsing. Our key insight is that the inference efficiency emerges as the vital bottleneck in LLM-based online log parsing, rather than parsing accuracy. InferLog accelerates inference by designing (1) A Prefix-aware ICL Refinement policy to refine the examples and permutation of in-context learning to improve the prefix caching efficiency. (2) A rapid and task-specific configuration tuning pipeline based on meta-learning to find the optimal LLM scheduling-related configuration for dynamic log parsing workloads. The experimental results based on Loghub dataset and vLLM demonstrate that InferLog significantly outperforms existing inference optimization methods and markedly accelerates the state-of-the-art LLM-based log parser without compromising parsing accuracy.
Problem

Research questions and friction points this paper is trying to address.

Optimizing LLM inference for efficient online log parsing
Reducing latency and throughput in high-volume log streams
Enhancing prefix caching efficiency for ICL-based log parsing
Innovation

Methods, ideas, or system contributions that make the work stand out.

Prefix-aware ICL Refinement for caching efficiency
Meta-learning based configuration tuning pipeline
Optimized LLM scheduling for dynamic workloads
🔎 Similar Papers
No similar papers found.