A Word is Worth 4-bit: Efficient Log Parsing with Binary Coded Decimal Recognition

📅 2025-06-01

📈 Citations: 0

✨ Influential: 0

career value

131K/year

🤖 AI Summary

Existing log parsers employ coarse-grained templates, limiting their effectiveness for high-precision downstream tasks. This paper proposes a lightweight character-level log parser that innovatively models log templates as Binary-Coded Decimal (BCD) sequences—achieving 4-bit compact representation while preserving semantic interpretability. Our method integrates character-level embeddings, a dynamic aggregation mechanism, and BCD sequence prediction, enabling LLM-level parsing accuracy without large-model parameters. Evaluated on LogHub-2k and an industrial labeled dataset, our parser matches the accuracy of the best LLM-based parser while accelerating inference by 3.2× and reducing memory footprint by 87%. It significantly outperforms existing semantic log parsers in both efficiency and accuracy, offering a practical, scalable solution for production-grade log analysis.

Technology Category

Application Category

📝 Abstract

System-generated logs are typically converted into categorical log templates through parsing. These templates are crucial for generating actionable insights in various downstream tasks. However, existing parsers often fail to capture fine-grained template details, leading to suboptimal accuracy and reduced utility in downstream tasks requiring precise pattern identification. We propose a character-level log parser utilizing a novel neural architecture that aggregates character embeddings. Our approach estimates a sequence of binary-coded decimals to achieve highly granular log templates extraction. Our low-resource character-level parser, tested on revised Loghub-2k and a manually annotated industrial dataset, matches LLM-based parsers in accuracy while outperforming semantic parsers in efficiency.

Problem

Research questions and friction points this paper is trying to address.

Improves fine-grained log template accuracy

Enhances efficiency in log pattern identification

Reduces resource usage in log parsing

Innovation

Methods, ideas, or system contributions that make the work stand out.

Character-level log parser with neural architecture

Binary-coded decimals for granular template extraction

Low-resource efficiency matching LLM-based parsers

🔎 Similar Papers

Lemur: Log Parsing with Entropy Sampling and Chain-of-Thought Merging