🤖 AI Summary
To address the challenges of programming complexity for GPU/FPGA accelerators, high cross-node data movement overhead, and difficulty in energy-efficiency optimization in heterogeneous distributed HPC systems, this paper proposes a high-level programming framework based on SYCL 2020. Its core contributions are: (1) Celerity—a novel distributed task dispatching mechanism supporting standard SYCL semantics, enabling unified scheduling and load balancing across CPUs, GPUs, and FPGAs; and (2) SYnergy—a power-modeling-driven co-optimization extension that integrates feedback control with multi-level memory-aware task mapping to achieve energy-aware execution. The framework is fully compatible with mainstream SYCL implementations and requires no modifications to existing SYCL code. Experimental evaluation on heterogeneous clusters demonstrates up to a 2.3× improvement in energy efficiency and a 1.8× speedup in task dispatching throughput.
📝 Abstract
Programming modern high-performance computing systems is challenging due to the need to efficiently program GPUs and accelerators and to handle data movement between nodes. The C++ language has been continuously enhanced in recent years with features that greatly increase productivity. In particular, the C++-based SYCL standard provides a powerful programming model for heterogeneous systems that can target a wide range of devices, including multicore CPUs, GPUs, FPGAs, and accelerators, while providing high-level abstractions. This presentation introduces our research efforts to design a SYCL-based high-level programming interface that provides advanced techniques such as task distribution and energy optimization. The key insight is that SYCL semantics can be easily extended to provide advanced features for easy integration into existing SYCL programs. In particular, we will highlight two SYCL extensions that are designed to deal with workload distribution on accelerator clusters (Celerity) and with energy-efficient computing (SYnergy).