🤖 AI Summary
This work proposes a novel pattern-driven log compression framework that challenges the conventional reliance on parsing accuracy for effective compression. Traditional parser-based approaches exhibit limited performance on complex production logs, and high parsing precision does not necessarily translate to high compression ratios. The study reveals that the key to superior compression lies in generating low-entropy, highly compressible log groups rather than achieving accurate log parsing. To this end, the authors introduce a pattern signature synthesis mechanism that enables efficient log grouping and encoding, significantly enhancing compression efficiency. Extensive experiments demonstrate that the proposed method consistently achieves state-of-the-art compression ratios and speeds across 16 public datasets and 10 real-world production logs.
📝 Abstract
Parser-based log compression, which separates static templates from dynamic variables, is a promising approach to exploit the unique structure of log data. However, its performance on complex production logs is often unsatisfactory. This performance gap coincides with a known degradation in the accuracy of its core log parsing component on such data, motivating our investigation into a foundational yet unverified question: does higher parsing accuracy necessarily lead to better compression ratio? To answer this, we conduct the first empirical study quantifying this relationship and find that a higher parsing accuracy does not guarantee a better compression ratio. Instead, our findings reveal that compression ratio is dictated by achieving effective pattern-based grouping and encoding, i.e., the partitioning of tokens into low entropy, highly compressible groups. Guided by this insight, we design DeLog, a novel log compressor that implements a Pattern Signature Synthesis mechanism to achieve efficient pattern-based grouping. On 16 public and 10 production datasets, DeLog achieves state-of-the-art compression ratio and speed.