DeepCQ: General-Purpose Deep-Surrogate Framework for Lossy Compression Quality Prediction

📅 2025-12-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing post-hoc quality assessment (e.g., PSNR, L2 norm) for bounded-loss compression of scientific time-series data incurs high computational overhead and lacks real-time feedback capability. Method: We propose the first general-purpose deep surrogate model tailored for time-series scientific data. Our approach employs a two-stage decoupled architecture that separates computationally expensive compression feature extraction from lightweight quality metric prediction. To enhance temporal robustness and generalization, we introduce a time-aware Mixture-of-Experts (MoE) mechanism. The model is trained end-to-end across multiple compressors, diverse quality metrics, and heterogeneous scientific datasets. Contribution/Results: Evaluated on four real-world scientific applications, our model achieves prediction errors consistently below 10%, significantly outperforming state-of-the-art methods. It enables on-the-fly, demand-driven optimization of compression parameters—reducing both I/O and computational overhead substantially.

Technology Category

Application Category

📝 Abstract
Error-bounded lossy compression techniques have become vital for scientific data management and analytics, given the ever-increasing volume of data generated by modern scientific simulations and instruments. Nevertheless, assessing data quality post-compression remains computationally expensive due to the intensive nature of metric calculations. In this work, we present a general-purpose deep-surrogate framework for lossy compression quality prediction (DeepCQ), with the following key contributions: 1) We develop a surrogate model for compression quality prediction that is generalizable to different error-bounded lossy compressors, quality metrics, and input datasets; 2) We adopt a novel two-stage design that decouples the computationally expensive feature-extraction stage from the light-weight metrics prediction, enabling efficient training and modular inference; 3) We optimize the model performance on time-evolving data using a mixture-of-experts design. Such a design enhances the robustness when predicting across simulation timesteps, especially when the training and test data exhibit significant variation. We validate the effectiveness of DeepCQ on four real-world scientific applications. Our results highlight the framework's exceptional predictive accuracy, with prediction errors generally under 10% across most settings, significantly outperforming existing methods. Our framework empowers scientific users to make informed decisions about data compression based on their preferred data quality, thereby significantly reducing I/O and computational overhead in scientific data analysis.
Problem

Research questions and friction points this paper is trying to address.

Predicts compression quality for error-bounded lossy compressors efficiently.
Generalizes across compressors, metrics, and datasets with a surrogate model.
Reduces computational overhead in scientific data analysis and I/O.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Deep-surrogate model for lossy compression quality prediction
Two-stage design decouples feature extraction from prediction
Mixture-of-experts enhances robustness for time-evolving data
🔎 Similar Papers
No similar papers found.