Shrink: Data Compression by Semantic Extraction and Residuals Encoding

📅 2024-10-09
🏛️ BigData Congress [Services Society]
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the trade-off between high-fidelity reconstruction and high compression ratios in IoT time-series data compression, this paper proposes a semantic-segment–residual dual-path compression framework. First, adaptive dynamic error thresholds are employed to extract linear semantic segments and construct a reusable knowledge base. Second, quantized encoding is applied to the fitting residuals, enabling unified support for multi-level lossy and lossless reconstruction. The key innovations include: (i) the first knowledge-driven paradigm integrating segmented linear fitting with residual compression; (ii) dynamic error thresholding adapted to local data characteristics; and (iii) scalability—compression ratio improves with increasing data volume. Evaluated on multiple real-world IoT datasets, the method achieves 2–5× higher compression ratios than state-of-the-art approaches, near-zero reconstruction error, and end-to-end latency under 10 ms.

Technology Category

Application Category

📝 Abstract
The distributed data infrastructure in Internet of Things (IoT) ecosystems requires efficient data-series compression methods, as well as the capability to meet different accuracy demands. However, the compression performance of existing compression methods degrades sharply when calling for ultra-accurate data recovery. In this paper, we introduce Shrink, a novel highly accurate data compression method that offers a higher compression ratio and lower runtime than prior compressors. Shrink extracts data semantics in the form of linear segments to construct a compact knowledge base, using a dynamic error threshold which can adapt to data characteristics. Then, it captures the remaining data details as residuals to support lossy compression at diverse resolutions as well as lossless compression. As Shrink effectively identifies repeated semantics, its compression ratio increases with data size. Our experimental evaluation demonstrates that Shrink outperforms state-of-art methods, achieving a twofold to fivefold improvement in compression ratio depending on the dataset.
Problem

Research questions and friction points this paper is trying to address.

Efficient data-series compression for IoT ecosystems
Ultra-accurate data recovery in compression methods
Dynamic error threshold adaptation for data characteristics
Innovation

Methods, ideas, or system contributions that make the work stand out.

Semantic extraction for compression
Dynamic error threshold adaptation
Residual encoding for flexibility
🔎 Similar Papers
No similar papers found.