Lossy Compression of Scientific Data: Applications Constrains and Requirements

📅 2025-03-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Exponential growth in scientific data has outpaced network bandwidth, storage capacity, and analytical capabilities, necessitating lossy compression. However, existing research lacks application-specific fidelity requirements aligned with scientific discovery goals, leading to a disconnect between algorithm design and real-world needs. Method: We conduct a systematic survey of nine representative scientific domains—including climate modeling, combustion, and cosmology—to identify cross-cutting quality constraints (e.g., error bounds on key physical quantities), compression ratios, and throughput requirements. We then develop a “discovery-oriented” compression evaluation framework and analyze error-control mechanisms and applicability boundaries of mainstream tools (SZ, ZFP, MGARD). Contribution/Results: This work delivers the first multi-disciplinary white paper on lossy compression requirements for scientific data, enabling verifiable benchmarking and co-evolution of high-fidelity, high-performance compression toolchains.

Technology Category

Application Category

📝 Abstract
Increasing data volumes from scientific simulations and instruments (supercomputers, accelerators, telescopes) often exceed network, storage, and analysis capabilities. The scientific community's response to this challenge is scientific data reduction. Reduction can take many forms, such as triggering, sampling, filtering, quantization, and dimensionality reduction. This report focuses on a specific technique: lossy compression. Lossy compression retains all data points, leveraging correlations and controlled reduced accuracy. Quality constraints, especially for quantities of interest, are crucial for preserving scientific discoveries. User requirements also include compression ratio and speed. While many papers have been published on lossy compression techniques and reference datasets are shared by the community, there is a lack of detailed specifications of application needs that can guide lossy compression researchers and developers. This report fills this gap by reporting on the requirements and constraints of nine scientific applications covering a large spectrum of domains (climate, combustion, cosmology, fusion, light sources, molecular dynamics, quantum circuit simulation, seismology, and system logs). The report also details key lossy compression technologies (SZ, ZFP, MGARD, LC, SPERR, DCTZ, TEZip, LibPressio), discussing their history, principles, error control, hardware support, features, and impact. By presenting both application needs and compression technologies, the report aims to inspire new research to fill existing gaps.
Problem

Research questions and friction points this paper is trying to address.

Addresses increasing scientific data volumes exceeding storage and analysis capabilities
Focuses on lossy compression techniques for scientific data reduction
Identifies gaps in application-specific needs for guiding compression research
Innovation

Methods, ideas, or system contributions that make the work stand out.

Lossy compression retains all data points
Controlled reduced accuracy for quality constraints
Detailed application needs guide compression research
🔎 Similar Papers
Franck Cappello
Franck Cappello
Argonne National Laboratory, IEEE Fellow
Parallel ProcessingParallel ComputingHigh Performance ComputingFault ToleranceData Compression
A
Allison Baker
NSF National Center for Atmospheric Research, Boulder, CO, USA
E
Ebru Bozda
Colorado School of Mines, Golden, CO, USA
M
M. Burtscher
Texas State University, San Marcos, TX, USA
Kyle Chard
Kyle Chard
University of Chicago and Argonne National Laboratory
computer sciencedistributed systemshigh performance computingscientific computing
Sheng Di
Sheng Di
Argonne National Labratory, IEEE Senior Member
HPCData CompressionResilienceCloud/Grid Computing/P2PFederated Learning
P
Paul Christopher O Grady
Stanford University, Stanford, CA, USA
P
Peng Jiang
University of Iowa, Iowa City, IA, USA
Shaomeng Li
Shaomeng Li
NSF National Center for Atmospheric Research, Boulder, CO, USA
Erik Lindahl
Erik Lindahl
Professor of Biophysics, Linköping University
Life scienceMembrane proteinscryo-EMmolecular dynamicsAI
Peter Lindstrom
Peter Lindstrom
Lawrence Livermore National Laboratory
Computer graphicsscientific visualizationgeometry processingdata compression
M
M. Lundborg
KTH Department of Applied Physics, Solna, Sweden
K
Kai Zhao
Florida State University, Tallahassee, FL, USA
X
Xin Liang
University of Kentucky, Lexington, KY, USA
M
Masaru Nagaso
Colorado School of Mines, Golden, CO, USA
Kento Sato
Kento Sato
RIKEN Center for Computational Science (RIKEN R-CCS)
High performance computingI/O & Big dataMachine/Deep learningFault toleranceDebugging
A
Amarjit Singh
RIKEN R-CCS, Japan
Seung Woo Son
Seung Woo Son
University of Massachusetts Lowell
High performance computingStorageComputer ArchitectureEmbedded Systems
Dingwen Tao
Dingwen Tao
Chinese Academy of Sciences, IEEE/ACM Senior Member
High Performance ComputingData ReductionDeep LearningSystems for MLGPU
Jiannan Tian
Jiannan Tian
Assistant Professor, Oakland University
HPC/AIlarge-scale data processing and analyticsHW-accelerated compression
Robert Underwood
Robert Underwood
Assistant Computer Scientist, Argonne National Laboratory
Data for AI for ScienceLossy CompressionDistributed ComputingReliable Computer Infrastructure
Kazutomo Yoshii
Kazutomo Yoshii
Argonne National Laboratory
HPCedge/embeddeddataflowAI acceleratorsFPGA/ASIC
D
Danylo Lykov
Argonne National Laboratory, Lemont, IL, USA
Yuri Alexeev
Yuri Alexeev
Senior Quantum Algorithm Engineer
Quantum Information Science
K
K. Felker
Argonne National Laboratory, Lemont, IL, USA