Performance Portable Gradient Computations Using Source Transformation

📅 2025-07-17

📈 Citations: 0

✨ Influential: 0

career value

193K/year

🤖 AI Summary

Automatic differentiation (AD) in C++ scientific computing faces significant deployment challenges due to the language’s complexity and limited support for heterogeneous architectures—particularly GPUs. Method: This paper proposes a high-performance, cross-platform AD approach based on source-to-source transformation. It extends the Clad AD framework to support the Kokkos abstraction layer, enabling unified gradient generation and optimization across diverse GPU architectures—including NVIDIA H100, AMD MI250x, and Intel Ponte Vecchio—while preserving Kokkos’ portable programming model. Contribution/Results: The method automatically generates efficient reverse-mode gradient code without compromising portability. Experimental evaluation shows that gradient computation overhead remains bounded at ≤2.17× the original function’s execution time. This advancement substantially enhances both the practicality and performance portability of C++ in differentiable scientific simulation and AI-integrated workflows.

Technology Category

Application Category

📝 Abstract

Derivative computation is a key component of optimization, sensitivity analysis, uncertainty quantification, and nonlinear solvers. Automatic differentiation (AD) is a powerful technique for evaluating such derivatives, and in recent years, has been integrated into programming environments such as Jax, PyTorch, and TensorFlow to support derivative computations needed for training of machine learning models, resulting in widespread use of these technologies. The C++ language has become the de facto standard for scientific computing due to numerous factors, yet language complexity has made the adoption of AD technologies for C++ difficult, hampering the incorporation of powerful differentiable programming approaches into C++ scientific simulations. This is exacerbated by the increasing emergence of architectures such as GPUs, which have limited memory capabilities and require massive thread-level concurrency. Portable scientific codes rely on domain specific programming models such as Kokkos making AD for such codes even more complex. In this paper, we will investigate source transformation-based automatic differentiation using Clad to automatically generate portable and efficient gradient computations of Kokkos-based code. We discuss the modifications of Clad required to differentiate Kokkos abstractions. We will illustrate the feasibility of our proposed strategy by comparing the wall-clock time of the generated gradient code with the wall-clock time of the input function on different cutting edge GPU architectures such as NVIDIA H100, AMD MI250x, and Intel Ponte Vecchio GPU. For these three architectures and for the considered example, evaluating up to 10 000 entries of the gradient only took up to 2.17x the wall-clock time of evaluating the input function.

Problem

Research questions and friction points this paper is trying to address.

Enable automatic differentiation for C++ scientific computing

Support gradient computations on diverse GPU architectures

Integrate AD with Kokkos for performance portability

Innovation

Methods, ideas, or system contributions that make the work stand out.

Source transformation-based automatic differentiation using Clad

Portable gradient computations for Kokkos-based code

Efficient gradient generation for multiple GPU architectures

🔎 Similar Papers

No similar papers found.