🤖 AI Summary
Scientific computing at the convergence of HPC and AI faces challenges including fragmented toolchains, poor hardware portability, and low cross-paradigm coordination efficiency. To address these, this paper introduces the first PyTorch-level unified abstraction framework enabling tight HPC/AI co-design. Our approach features a hardware-agnostic operator registration mechanism, a dynamic workflow orchestration engine, and an automatic mixed-precision scheduler. Implemented via a C++/Python hybrid architecture, it integrates MPI+NCCL communication, ONNX Runtime extensions, an adaptive graph compiler, and a declarative task-graph DSL. Evaluated on leadership-class supercomputers—including Eagle and Perlmutter—the framework achieves a 3.2× throughput improvement in AI training and reduces end-to-end latency by 67% for coupled HPC simulation and ML inference. It has been deployed in production scientific workloads, including climate modeling and plasma simulation.
📝 Abstract
Current trends point to a future where large-scale scientific applications are tightly-coupled HPC/AI hybrids. Hence, we urgently need to invest in creating a seamless, scalable framework where HPC and AI/ML can efficiently work together and adapt to novel hardware and vendor libraries without starting from scratch every few years. The current ecosystem and sparsely-connected community are not sufficient to tackle these challenges, and we require a breakthrough catalyst for science similar to what PyTorch enabled for AI.