Transforming the Use of Earth Observation Data: Exascale Training of a Generative Compression Model with Historical Priors for up to 10,000x Data Reduction

📅 2026-05-08

📈 Citations: 0

✨ Influential: 0

career value

216K/year

🤖 AI Summary

This work addresses the surge in Earth observation data, where conventional compression methods—focused solely on storage and transmission—lack task-awareness. We propose the first generative compression framework that integrates historical priors by exploiting temporal redundancy in observational data. The system is trained end-to-end on Arm-based supercomputers, achieving a peak performance of 2.16 EFLOP/s. Through holistic hardware-software co-design—including model architecture, kernel optimization, memory hierarchy management, and parallelization strategies—our approach enables on-demand compression ratios ranging from 100× to 10,000×. Evaluated on the LineShine Armv9 platform, it significantly enhances efficiency across data acquisition, transmission, storage, and downstream scientific workflows, transforming compression from a passive utility into an active, task-driven data foundation.

📝 Abstract

Earth observation is becoming one of the largest data-producing activities in science, yet current pipelines still treat compression as a storage and transmission tool rather than a new way to use data. We present a generative compression framework that learns from historical Earth observation archives and enables on-demand 100x to 10,000x data reduction across downstream tasks. Unlike general visual data, Earth observation repeatedly measures the same evolving planet, making historical-prior learning feasible for extreme compression. To realize this paradigm, we train large generative compression models at exascale on the LineShine Armv9 CPU supercomputer, with co-optimization across model design, kernels, memory hierarchy, runtime, and parallelism. Our implementation sustains 1.54 EFLOP/s and peaks at 2.16 EFLOP/s in end-to-end training. This work shows that historical-prior generative compression can turn Earth observation data into an active, task-adaptive foundation for acquisition, delivery, storage, and scientific use.

Problem

Research questions and friction points this paper is trying to address.

Earth observation

data compression

generative model

historical priors

exascale computing

Innovation

Methods, ideas, or system contributions that make the work stand out.

generative compression

historical priors

exascale training