On Sequence-to-Sequence Models for Automated Log Parsing

📅 2026-02-07

📈 Citations: 0

✨ Influential: 0

career value

190K/year

🤖 AI Summary

Automated log parsing faces challenges such as heterogeneous log formats, data distribution shifts, and the fragility of rule-based methods. This work systematically evaluates four sequence modeling architectures—Transformer, Mamba, unidirectional LSTM, and bidirectional LSTM—within a unified framework, employing character-level tokenization and the relative Levenshtein edit distance as the primary metric, complemented by statistical significance testing. The study investigates how architectural choices, representation strategies, sequence length, and training data volume affect both performance and computational cost. Experimental results show that the Transformer achieves the lowest average relative edit distance (0.111), reducing parsing error rates by 23.4%, while Mamba attains comparable accuracy with substantially lower computational overhead. Both models also demonstrate superior sample efficiency. This study presents the first comprehensive comparison of state-of-the-art sequence models for log parsing and identifies key factors underlying efficient modeling paradigms.

Technology Category

Application Category

📝 Abstract

Log parsing is a critical standard operating procedure in software systems, enabling monitoring, anomaly detection, and failure diagnosis. However, automated log parsing remains challenging due to heterogeneous log formats, distribution shifts between training and deployment data, and the brittleness of rule-based approaches. This study aims to systematically evaluate how sequence modelling architecture, representation choice, sequence length, and training data availability influence automated log parsing performance and computational cost. We conduct a controlled empirical study comparing four sequence modelling architectures: Transformer, Mamba state-space, monodirectional LSTM, and bidirectional LSTM models. In total, 396 models are trained across multiple dataset configurations and evaluated using relative Levenshtein edit distance with statistical significance testing. Transformer achieves the lowest mean relative edit distance (0.111), followed by Mamba (0.145), mono-LSTM (0.186), and bi-LSTM (0.265), where lower values are better. Mamba provides competitive accuracy with substantially lower computational cost. Character-level tokenization generally improves performance, sequence length has negligible practical impact on Transformer accuracy, and both Mamba and Transformer demonstrate stronger sample efficiency than recurrent models. Overall, Transformers reduce parsing error by 23.4%, while Mamba is a strong alternative under data or compute constraints. These results also clarify the roles of representation choice, sequence length, and sample efficiency, providing practical guidance for researchers and practitioners.

Problem

Research questions and friction points this paper is trying to address.

log parsing

heterogeneous log formats

distribution shift

automated log analysis

sequence-to-sequence modeling

Innovation

Methods, ideas, or system contributions that make the work stand out.

log parsing

sequence-to-sequence modeling

Transformer