🤖 AI Summary
Modern software and IoT systems generate massive volumes of log data—up to tens of petabytes per day—leading to prohibitive storage and transmission costs. To address this, we propose LogZip: a lightweight, streaming, lossless log compression algorithm. Unlike prior approaches, LogZip requires no predefined rules, model training, or domain-specific prior knowledge. Instead, it identifies four structural regularities from public log datasets and leverages them to design real-time semantic unit parsing and adaptive encoding. It natively supports both TEXT and JSON formats and dynamically adapts to format evolution. LogZip achieves Pareto-optimal trade-offs between compression ratio and speed while maintaining high-throughput streaming processing: it improves average compression ratios by up to 67.8% and boosts compression throughput by 2.7×, significantly reducing end-to-end logging overhead across the log lifecycle.
📝 Abstract
Log data is a vital resource for capturing system events and states. With the increasing complexity and widespread adoption ofmodern software systems and IoT devices, the daily volume of log generation has surged to tens of petabytes, leading to significant collection and storage costs. To address this challenge, lossless log compression has emerged as an effective solution, enabling substantial resource savings without compromising log information. In this paper, we first conduct a characterization study on extensive public log datasets and identify four key observations. Building on these insights, we propose LogLite, a lightweight, plug-and-play, streaming lossless compression algorithm designed to handle both TEXT and JSON logs throughout their life cycle. LogLite requires no predefined rules or pre-training and is inherently adaptable to evolving log structures. Our evaluation shows that, compared to state-of-the-art baselines, LogLite achieves Pareto optimality in most scenarios, delivering an average improvement of up to 67.8% in compression ratio and up to 2.7 $ imes$ in compression speed.