LogFold: Compressing Logs with Structured Tokens and Hybrid Encoding

📅 2026-03-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work proposes LogFold, a novel log compression approach that addresses the limitations of existing methods in effectively exploiting redundancy within structured tokens and lacking fine-grained encoding strategies tailored to different token types. LogFold is the first to uncover a delimiter-skeleton redundancy pattern inherent in structured tokens and introduces a type-aware hybrid encoding mechanism that optimizes compression separately for structured, unstructured, and static tokens. The system architecture integrates a token analyzer, a skeleton pattern mining module, a type-adaptive encoder, and a packing compression component. Evaluated on 16 public log datasets, LogFold achieves an average compression ratio improvement of 11.11% while maintaining a compression throughput of 9.842 MB/s, demonstrating its effectiveness and robustness.

Technology Category

Application Category

📝 Abstract
Logs are essential for diagnosing failures and conducting retrospective studies, leading many software organizations to retain log messages for a long time. Nevertheless, the volume of generated log data grows rapidly as software systems grow, necessitating an effective compression method. Apart from general-purpose compressors (e.g., Gzip, Bzip2), many recent studies developed log-specific compression algorithms, but they offer suboptimal performance because of (1) overlooking redundancies within certain complex tokens, and (2) lacking a fine-grained encoding strategy for diverse token types. This work uncovers a new redundancy pattern in structured tokens and proposes a new type-aware encoding strategy to improve log compression. Building on this insight, we introduce LogFold, a novel log compression method consisting of four components: a token analyzer to classifies tokens as structured, unstructured, or static types; a processor that mines recurring patterns within structured tokens based on their delimiter skeletons; a hybrid encoder that tailors data representation according to token types; and a packer that compresses the output into an archive file. Extensive experiments on 16 public log datasets demonstrate that LogFold surpasses state-of-the-art baselines, achieving average compression ratio improvements by 11.11%, with a compression speed of 9.842 MB/s. Ablation studies further indicate the importance of each component. We also conduct sensitivity analyses to verify LogFold's robustness and stability across various internal settings.
Problem

Research questions and friction points this paper is trying to address.

log compression
structured tokens
redundancy
encoding strategy
token types
Innovation

Methods, ideas, or system contributions that make the work stand out.

structured tokens
hybrid encoding
log compression
token-type awareness
redundancy mining