🤖 AI Summary
This work addresses the lack of efficient support for dynamic migration of work items—such as rays—across GPUs in multi-node, multi-GPU data-parallel computing. The authors propose RaFI, a software framework built on CUDA and MPI, which introduces, for the first time, a unified interface enabling GPU kernels to succinctly forward work items to other GPUs while automatically managing the underlying communication and data transfers. By abstracting away the complexities of coordinated CUDA-MPI programming, RaFI significantly simplifies the development of multi-GPU collaborative applications. Empirical evaluation across several use cases demonstrates that the framework not only eases programming but also maintains high performance and strong scalability.
📝 Abstract
We present RaFI, a CUDA and MPI based software framework that simplifies the task of building GPU-enabled data-parallel software where rays or similar work items need to migrate between different GPUs. RaFI provides a simple interface for CUDA kernels to forward such work items to other GPUs, while under the hood managing all the CUDA and MPI related work required to make this happen. We describe RaFI's motivation and implementation, and show its potential in several example applications.