A Memory Efficient Adjoint Method to Enable Billion Parameter Optimization on a Single GPU in Dynamic Problems

📅 2025-09-19

📈 Citations: 0

✨ Influential: 0

career value

221K/year

🤖 AI Summary

In dynamic optimization, conventional adjoint methods require storing the full spatiotemporal wavefield, resulting in memory consumption scaling linearly with problem size—severely limiting scalability for large-scale problems. To address this, we propose an approximate adjoint method grounded in the superposition principle, reducing memory complexity for sensitivity computation from *O(TN)* to *O(N)*, where *T* is the number of time steps and *N* the number of spatial degrees of freedom. The method avoids storing the entire time-history wavefield, instead retaining only a few localized temporal states. Integrated with a CUDA-accelerated finite-difference forward solver, it enables iterative sensitivity updates. On an NVIDIA A100 GPU, we achieve, for the first time, billion-parameter-scale dynamic full-waveform inversion and transient acoustic topology optimization. Memory usage is reduced by one to two orders of magnitude, with controlled accuracy degradation (<5%).

Technology Category

Application Category

📝 Abstract

Dynamic optimization is currently limited by sensitivity computations that require information from full forward and adjoint wave fields. Since the forward and adjoint solutions are computed in opposing time directions, the forward solution must be stored. This requires a substantial amount of memory for large-scale problems even when using check pointing or data compression techniques. As a result, the problem size is memory bound rather than bound by wall clock time, when working with modern GPU-based implementations that have limited memory capacity. To overcome this limitation, we introduce a new approach for approximate sensitivity computation based on the adjoint method (for self-adjoint problems) that relies on the principle of superposition. The approximation allows an iterative computation of the sensitivity, reducing the memory burden to that of the solution at a small number of time steps, i.e., to the number of degrees of freedom. This enables sensitivity computations for problems with billions of degrees of freedom on current GPUs, such as the A100 from NVIDIA (from 2020). We demonstrate the approach on full waveform inversion and transient acoustic topology optimization problems, relying on a highly efficient finite difference forward solver implemented in CUDA. Phenomena such as damping cannot be considered, as the approximation technique is limited to self-adjoint problems.

Problem

Research questions and friction points this paper is trying to address.

Reducing memory usage for adjoint sensitivity computations

Enabling billion-parameter optimization on single GPU

Overcoming memory limitations in dynamic optimization problems

Innovation

Methods, ideas, or system contributions that make the work stand out.

Adjoint method using superposition principle

Iterative sensitivity computation reducing memory

Enables billion-parameter optimization on single GPU

🔎 Similar Papers

Stability of Primal-Dual Gradient Flow Dynamics for Multi-Block Convex Optimization Problems