🤖 AI Summary
To address the challenge of efficient, error-bounded lossy compression for massive scientific datasets, this paper proposes a spatiotemporal graph autoencoder-based compression method. It constructs an irregular-grid graph structure that preserves multidimensional spatiotemporal correlations and, for the first time, introduces temporal graph neural networks into scientific data compression to explicitly model non-uniform spatiotemporal dependencies. Furthermore, it designs an error-constrained reconstruction optimization mechanism that guarantees pointwise absolute reconstruction errors strictly satisfy user-specified error bounds. Evaluated on both real-world and synthetic scientific datasets, the method consistently outperforms state-of-the-art approaches—including SZ3.1—achieving 22%–50% higher compression ratios while simultaneously delivering high compression efficiency and verifiable, rigorous error guarantees.
📝 Abstract
The generation of voluminous scientific data poses significant challenges for efficient storage, transfer, and analysis. Recently, error-bounded lossy compression methods emerged due to their ability to achieve high compression ratios while controlling data distortion. However, they often overlook the inherent spatial and temporal correlations within scientific data, thus missing opportunities for higher compression. In this paper we propose GRAPHCOMP, a novel graph-based method for error-bounded lossy compression of scientific data. We perform irregular segmentation of the original grid data and generate a graph representation that preserves the spatial and temporal correlations. Inspired by Graph Neural Networks (GNNs), we then propose a temporal graph autoencoder to learn latent representations that significantly reduce the size of the graph, effectively compressing the original data. Decompression reverses the process and utilizes the learnt graph model together with the latent representation to reconstruct an approximation of the original data. The decompressed data are guaranteed to satisfy a user-defined point-wise error bound. We compare our method against the state-of-the-art error-bounded lossy methods (i.e., HPEZ, SZ3.1, SPERR, and ZFP) on large-scale real and synthetic data. GRAPHCOMP consistently achieves the highest compression ratio across most datasets, outperforming the second-best method by margins ranging from 22% to 50%.