Transforming the Use of Earth Observation Data: Exascale Training of a Generative Compression Model with Historical Priors for up to 10,000x Data Reduction

📅 2026-05-08
📈 Citations: 0
Influential: 0
📄 PDF

career value

205K/year
🤖 AI Summary
This work addresses the surge in Earth observation data, where conventional compression methods—focused solely on storage and transmission—lack task-awareness. We propose the first generative compression framework that integrates historical priors by exploiting temporal redundancy in observational data. The system is trained end-to-end on Arm-based supercomputers, achieving a peak performance of 2.16 EFLOP/s. Through holistic hardware-software co-design—including model architecture, kernel optimization, memory hierarchy management, and parallelization strategies—our approach enables on-demand compression ratios ranging from 100× to 10,000×. Evaluated on the LineShine Armv9 platform, it significantly enhances efficiency across data acquisition, transmission, storage, and downstream scientific workflows, transforming compression from a passive utility into an active, task-driven data foundation.
📝 Abstract
Earth observation is becoming one of the largest data-producing activities in science, yet current pipelines still treat compression as a storage and transmission tool rather than a new way to use data. We present a generative compression framework that learns from historical Earth observation archives and enables on-demand 100x to 10,000x data reduction across downstream tasks. Unlike general visual data, Earth observation repeatedly measures the same evolving planet, making historical-prior learning feasible for extreme compression. To realize this paradigm, we train large generative compression models at exascale on the LineShine Armv9 CPU supercomputer, with co-optimization across model design, kernels, memory hierarchy, runtime, and parallelism. Our implementation sustains 1.54 EFLOP/s and peaks at 2.16 EFLOP/s in end-to-end training. This work shows that historical-prior generative compression can turn Earth observation data into an active, task-adaptive foundation for acquisition, delivery, storage, and scientific use.
Problem

Research questions and friction points this paper is trying to address.

Earth observation
data compression
generative model
historical priors
exascale computing
Innovation

Methods, ideas, or system contributions that make the work stand out.

generative compression
historical priors
exascale training
Earth observation
data reduction
🔎 Similar Papers
No similar papers found.
J
Jinxiao Zhang
Institute of Data and Information, Tsinghua Shenzhen International Graduate School
R
Runmin Dong
Sun Yat-Sen University
X
Xiyong Wu
Institute of Data and Information, Tsinghua Shenzhen International Graduate School
X
Xihan Huang
Institute of Data and Information, Tsinghua Shenzhen International Graduate School
Shenggan Cheng
Shenggan Cheng
National University of Singapore
Machine Learning SystemsHigh Performance ComputingDeep Learning
Y
Yunkai Yang
Sun Yat-Sen University
Z
Zheng Zhou
Institute of Data and Information, Tsinghua Shenzhen International Graduate School
Y
Yunpu Xu
Institute of Data and Information, Tsinghua Shenzhen International Graduate School
Z
Zhaoyang Luo
Institute of Data and Information, Tsinghua Shenzhen International Graduate School
M
Miao Yang
Department of Earth System Science, Tsinghua University
Fan Wei
Fan Wei
Department of Mathematics, Princeton University
AnalysisCombinatoricsProbability
Mengxuan Chen
Mengxuan Chen
Tsinghua University
AI4Sciencemachine learningearth system model
Yang You
Yang You
Postdoc, Stanford University
3D visioncomputer graphicscomputational geometry
J
Juepeng Zheng
Sun Yat-Sen University
W
Weijia Li
Institute of Data and Information, Tsinghua Shenzhen International Graduate School
Y
Yutong Lu
Sun Yat-Sen University
Haohuan Fu
Haohuan Fu
Tsinghua University