🤖 AI Summary
To address storage, transmission, and analysis bottlenecks arising from exascale scientific data volumes, this paper proposes a lightweight data reduction framework that balances GPU acceleration with cross-platform portability. The framework integrates heterogeneous computing scheduling, zero-copy memory access, adaptive lossy compression, and MPI+GPU co-optimized I/O. It reduces memory movement overhead to just 2.3% and achieves near-linear multi-GPU scalability (96% of theoretical speedup). On the Frontier supercomputer, it delivers an end-to-end throughput of 103 TB/s, improving parallel I/O performance by up to 3.5× over state-of-the-art solutions—yielding a 4× overall speedup. Its core innovation lies in the first unified realization of high throughput, low overhead, and cross-architecture portability, establishing an efficient, scalable compression and reduction infrastructure for exascale data processing.
📝 Abstract
The rapid growth of scientific data is surpassing advancements in computing, creating challenges in storage, transfer, and analysis, particularly at the exascale. While data reduction techniques such as lossless and lossy compression help mitigate these issues, their computational overhead introduces new bottlenecks. GPU-accelerated approaches improve performance but face challenges in portability, memory transfer, and scalability on multi-GPU systems. To address these, we propose HPDR, a high-performance, portable data reduction framework. HPDR supports diverse processor architectures, reducing memory transfer overhead to 2.3% and achieving up to 3.5x faster throughput than existing solutions. It attains 96% of the theoretical speedup in multi-GPU settings. Evaluations on the Frontier supercomputer demonstrate 103 TB/s throughput and up to 4x acceleration in parallel I/O performance at scale. HPDR offers a scalable, efficient solution for managing massive data volumes in exascale computing environments.