🤖 AI Summary
Existing log parsers employ coarse-grained templates, limiting their effectiveness for high-precision downstream tasks. This paper proposes a lightweight character-level log parser that innovatively models log templates as Binary-Coded Decimal (BCD) sequences—achieving 4-bit compact representation while preserving semantic interpretability. Our method integrates character-level embeddings, a dynamic aggregation mechanism, and BCD sequence prediction, enabling LLM-level parsing accuracy without large-model parameters. Evaluated on LogHub-2k and an industrial labeled dataset, our parser matches the accuracy of the best LLM-based parser while accelerating inference by 3.2× and reducing memory footprint by 87%. It significantly outperforms existing semantic log parsers in both efficiency and accuracy, offering a practical, scalable solution for production-grade log analysis.
📝 Abstract
System-generated logs are typically converted into categorical log templates through parsing. These templates are crucial for generating actionable insights in various downstream tasks. However, existing parsers often fail to capture fine-grained template details, leading to suboptimal accuracy and reduced utility in downstream tasks requiring precise pattern identification. We propose a character-level log parser utilizing a novel neural architecture that aggregates character embeddings. Our approach estimates a sequence of binary-coded decimals to achieve highly granular log templates extraction. Our low-resource character-level parser, tested on revised Loghub-2k and a manually annotated industrial dataset, matches LLM-based parsers in accuracy while outperforming semantic parsers in efficiency.