LogLite: Lightweight Plug-and-Play Streaming Log Compression

📅 2025-07-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Modern software and IoT systems generate massive volumes of log data—up to tens of petabytes per day—leading to prohibitive storage and transmission costs. To address this, we propose LogZip: a lightweight, streaming, lossless log compression algorithm. Unlike prior approaches, LogZip requires no predefined rules, model training, or domain-specific prior knowledge. Instead, it identifies four structural regularities from public log datasets and leverages them to design real-time semantic unit parsing and adaptive encoding. It natively supports both TEXT and JSON formats and dynamically adapts to format evolution. LogZip achieves Pareto-optimal trade-offs between compression ratio and speed while maintaining high-throughput streaming processing: it improves average compression ratios by up to 67.8% and boosts compression throughput by 2.7×, significantly reducing end-to-end logging overhead across the log lifecycle.

Technology Category

Application Category

📝 Abstract
Log data is a vital resource for capturing system events and states. With the increasing complexity and widespread adoption ofmodern software systems and IoT devices, the daily volume of log generation has surged to tens of petabytes, leading to significant collection and storage costs. To address this challenge, lossless log compression has emerged as an effective solution, enabling substantial resource savings without compromising log information. In this paper, we first conduct a characterization study on extensive public log datasets and identify four key observations. Building on these insights, we propose LogLite, a lightweight, plug-and-play, streaming lossless compression algorithm designed to handle both TEXT and JSON logs throughout their life cycle. LogLite requires no predefined rules or pre-training and is inherently adaptable to evolving log structures. Our evaluation shows that, compared to state-of-the-art baselines, LogLite achieves Pareto optimality in most scenarios, delivering an average improvement of up to 67.8% in compression ratio and up to 2.7 $ imes$ in compression speed.
Problem

Research questions and friction points this paper is trying to address.

Reduce massive log storage costs efficiently
Compress logs without losing critical information
Handle diverse log formats adaptively
Innovation

Methods, ideas, or system contributions that make the work stand out.

Lightweight plug-and-play streaming compression
Handles TEXT and JSON logs adaptively
No predefined rules or pre-training needed
B
Benzhao Tang
Cyberspace Institute of Advanced Technology of Guangzhou University & Huangpu Research School of Guangzhou University
Shiyu Yang
Shiyu Yang
Cyberspace Institute of Advanced Technology of Guangzhou University & Huangpu Research School of Guangzhou University
Zhitao Shen
Zhitao Shen
Ant Group
databasedata storage
W
Wenjie Zhang
University of New South Wales
X
Xuemin Lin
Shanghai Jiao Tong University
Z
Zhihong Tian
Guangdong Key Laboratory of Industrial Control System Security