LogLite: Lightweight Plug-and-Play Streaming Log Compression

๐Ÿ“… 2025-07-14
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF

career value

226K/year
๐Ÿค– AI Summary
Modern software and IoT systems generate massive volumes of log dataโ€”up to tens of petabytes per dayโ€”leading to prohibitive storage and transmission costs. To address this, we propose LogZip: a lightweight, streaming, lossless log compression algorithm. Unlike prior approaches, LogZip requires no predefined rules, model training, or domain-specific prior knowledge. Instead, it identifies four structural regularities from public log datasets and leverages them to design real-time semantic unit parsing and adaptive encoding. It natively supports both TEXT and JSON formats and dynamically adapts to format evolution. LogZip achieves Pareto-optimal trade-offs between compression ratio and speed while maintaining high-throughput streaming processing: it improves average compression ratios by up to 67.8% and boosts compression throughput by 2.7ร—, significantly reducing end-to-end logging overhead across the log lifecycle.

Technology Category

Application Category

๐Ÿ“ Abstract
Log data is a vital resource for capturing system events and states. With the increasing complexity and widespread adoption ofmodern software systems and IoT devices, the daily volume of log generation has surged to tens of petabytes, leading to significant collection and storage costs. To address this challenge, lossless log compression has emerged as an effective solution, enabling substantial resource savings without compromising log information. In this paper, we first conduct a characterization study on extensive public log datasets and identify four key observations. Building on these insights, we propose LogLite, a lightweight, plug-and-play, streaming lossless compression algorithm designed to handle both TEXT and JSON logs throughout their life cycle. LogLite requires no predefined rules or pre-training and is inherently adaptable to evolving log structures. Our evaluation shows that, compared to state-of-the-art baselines, LogLite achieves Pareto optimality in most scenarios, delivering an average improvement of up to 67.8% in compression ratio and up to 2.7 $ imes$ in compression speed.
Problem

Research questions and friction points this paper is trying to address.

Reduce massive log storage costs efficiently
Compress logs without losing critical information
Handle diverse log formats adaptively
Innovation

Methods, ideas, or system contributions that make the work stand out.

Lightweight plug-and-play streaming compression
Handles TEXT and JSON logs adaptively
No predefined rules or pre-training needed