🤖 AI Summary
This work addresses the surge in Earth observation data, where conventional compression methods—focused solely on storage and transmission—lack task-awareness. We propose the first generative compression framework that integrates historical priors by exploiting temporal redundancy in observational data. The system is trained end-to-end on Arm-based supercomputers, achieving a peak performance of 2.16 EFLOP/s. Through holistic hardware-software co-design—including model architecture, kernel optimization, memory hierarchy management, and parallelization strategies—our approach enables on-demand compression ratios ranging from 100× to 10,000×. Evaluated on the LineShine Armv9 platform, it significantly enhances efficiency across data acquisition, transmission, storage, and downstream scientific workflows, transforming compression from a passive utility into an active, task-driven data foundation.
📝 Abstract
Earth observation is becoming one of the largest data-producing activities in science, yet current pipelines still treat compression as a storage and transmission tool rather than a new way to use data. We present a generative compression framework that learns from historical Earth observation archives and enables on-demand 100x to 10,000x data reduction across downstream tasks. Unlike general visual data, Earth observation repeatedly measures the same evolving planet, making historical-prior learning feasible for extreme compression. To realize this paradigm, we train large generative compression models at exascale on the LineShine Armv9 CPU supercomputer, with co-optimization across model design, kernels, memory hierarchy, runtime, and parallelism. Our implementation sustains 1.54 EFLOP/s and peaks at 2.16 EFLOP/s in end-to-end training. This work shows that historical-prior generative compression can turn Earth observation data into an active, task-adaptive foundation for acquisition, delivery, storage, and scientific use.