Lossless Compression of Time Series Data: A Comparative Study

📅 2025-10-08
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the need for efficient lossless compression of time-series data, this paper proposes a two-stage unified framework comprising data transformation (e.g., differencing, predictive coding) followed by entropy coding (e.g., Huffman, arithmetic coding). We conduct the largest end-to-end lossless compression benchmark to date, evaluating combinations of transformation and coding methods across synthetic and diverse real-world time-series datasets. A standardized evaluation protocol and systematic ablation analysis are introduced to quantify individual component contributions and reveal the sensitivity of algorithm performance to intrinsic data characteristics. Key contributions include: (i) empirical validation that holistic pipeline design—not isolated components—dominates overall compression efficacy; (ii) identification of critical synergies among transformation and coding modules; and (iii) establishment of principled guidelines for scenario-aware algorithm selection and customized pipeline construction, bridging theoretical insight with practical deployment.

Technology Category

Application Category

📝 Abstract
Our increasingly digital and connected world has led to the generation of unprecedented amounts of data. This data must be efficiently managed, transmitted, and stored to preserve resources and allow scalability. Data compression has therein been a key technology for a long time, resulting in a vast landscape of available techniques. This largest-to-date study analyzes and compares various lossless data compression methods for time series data. We present a unified framework encompassing two stages: data transformation and entropy encoding. We evaluate compression algorithms across both synthetic and real-world datasets with varying characteristics. Through ablation studies at each compression stage, we isolate the impact of individual components on overall compression performance -- revealing the strengths and weaknesses of different algorithms when facing diverse time series properties. Our study underscores the importance of well-configured and complete compression pipelines beyond individual components or algorithms; it offers a comprehensive guide for selecting and composing the most appropriate compression algorithms tailored to specific datasets.
Problem

Research questions and friction points this paper is trying to address.

Evaluating lossless compression methods for time series data
Comparing algorithm performance across diverse synthetic and real datasets
Identifying optimal compression pipeline configurations for specific data characteristics
Innovation

Methods, ideas, or system contributions that make the work stand out.

Unified framework with transformation and encoding stages
Ablation studies isolating component impact on performance
Comprehensive guide for tailored compression algorithm selection
🔎 Similar Papers
No similar papers found.
J
Jonas G. Matt
Automatic Control Laboratory, ETH Zürich, Zürich, Switzerland
Pengcheng Huang
Pengcheng Huang
Computer Engineering Group, ETH Zurich
Intelligent Learning SystemsCyber Physical Systems
B
Balz Maag
Corporate Research Center, ABB, Baden-Dättwil, Switzerland