Shrink: Data Compression by Semantic Extraction and Residuals Encoding

📅 2024-10-09

🏛️ BigData Congress [Services Society]

📈 Citations: 0

✨ Influential: 0

career value

190K/year

🤖 AI Summary

To address the trade-off between high-fidelity reconstruction and high compression ratios in IoT time-series data compression, this paper proposes a semantic-segment–residual dual-path compression framework. First, adaptive dynamic error thresholds are employed to extract linear semantic segments and construct a reusable knowledge base. Second, quantized encoding is applied to the fitting residuals, enabling unified support for multi-level lossy and lossless reconstruction. The key innovations include: (i) the first knowledge-driven paradigm integrating segmented linear fitting with residual compression; (ii) dynamic error thresholding adapted to local data characteristics; and (iii) scalability—compression ratio improves with increasing data volume. Evaluated on multiple real-world IoT datasets, the method achieves 2–5× higher compression ratios than state-of-the-art approaches, near-zero reconstruction error, and end-to-end latency under 10 ms.

Technology Category

Application Category

📝 Abstract

The distributed data infrastructure in Internet of Things (IoT) ecosystems requires efficient data-series compression methods, as well as the capability to meet different accuracy demands. However, the compression performance of existing compression methods degrades sharply when calling for ultra-accurate data recovery. In this paper, we introduce Shrink, a novel highly accurate data compression method that offers a higher compression ratio and lower runtime than prior compressors. Shrink extracts data semantics in the form of linear segments to construct a compact knowledge base, using a dynamic error threshold which can adapt to data characteristics. Then, it captures the remaining data details as residuals to support lossy compression at diverse resolutions as well as lossless compression. As Shrink effectively identifies repeated semantics, its compression ratio increases with data size. Our experimental evaluation demonstrates that Shrink outperforms state-of-art methods, achieving a twofold to fivefold improvement in compression ratio depending on the dataset.

Problem

Research questions and friction points this paper is trying to address.

Efficient data-series compression for IoT ecosystems

Ultra-accurate data recovery in compression methods

Dynamic error threshold adaptation for data characteristics

Innovation

Methods, ideas, or system contributions that make the work stand out.

Semantic extraction for compression

Dynamic error threshold adaptation

Residual encoding for flexibility

🔎 Similar Papers

Minimal Algorithmic Information Loss Methods for Dimension Reduction, Feature Selection and Network Sparsification.