A Memory Efficient Adjoint Method to Enable Billion Parameter Optimization on a Single GPU in Dynamic Problems

📅 2025-09-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In dynamic optimization, conventional adjoint methods require storing the full spatiotemporal wavefield, resulting in memory consumption scaling linearly with problem size—severely limiting scalability for large-scale problems. To address this, we propose an approximate adjoint method grounded in the superposition principle, reducing memory complexity for sensitivity computation from *O(TN)* to *O(N)*, where *T* is the number of time steps and *N* the number of spatial degrees of freedom. The method avoids storing the entire time-history wavefield, instead retaining only a few localized temporal states. Integrated with a CUDA-accelerated finite-difference forward solver, it enables iterative sensitivity updates. On an NVIDIA A100 GPU, we achieve, for the first time, billion-parameter-scale dynamic full-waveform inversion and transient acoustic topology optimization. Memory usage is reduced by one to two orders of magnitude, with controlled accuracy degradation (<5%).

Technology Category

Application Category

📝 Abstract
Dynamic optimization is currently limited by sensitivity computations that require information from full forward and adjoint wave fields. Since the forward and adjoint solutions are computed in opposing time directions, the forward solution must be stored. This requires a substantial amount of memory for large-scale problems even when using check pointing or data compression techniques. As a result, the problem size is memory bound rather than bound by wall clock time, when working with modern GPU-based implementations that have limited memory capacity. To overcome this limitation, we introduce a new approach for approximate sensitivity computation based on the adjoint method (for self-adjoint problems) that relies on the principle of superposition. The approximation allows an iterative computation of the sensitivity, reducing the memory burden to that of the solution at a small number of time steps, i.e., to the number of degrees of freedom. This enables sensitivity computations for problems with billions of degrees of freedom on current GPUs, such as the A100 from NVIDIA (from 2020). We demonstrate the approach on full waveform inversion and transient acoustic topology optimization problems, relying on a highly efficient finite difference forward solver implemented in CUDA. Phenomena such as damping cannot be considered, as the approximation technique is limited to self-adjoint problems.
Problem

Research questions and friction points this paper is trying to address.

Reducing memory usage for adjoint sensitivity computations
Enabling billion-parameter optimization on single GPU
Overcoming memory limitations in dynamic optimization problems
Innovation

Methods, ideas, or system contributions that make the work stand out.

Adjoint method using superposition principle
Iterative sensitivity computation reducing memory
Enables billion-parameter optimization on single GPU
🔎 Similar Papers
No similar papers found.
L
Leon Herrmann
Chair of Data Engineering in Construction, Bauhaus-Universität Weimar, Coudraystraße 13 b, 99423, Weimar, Germany
T
Tim Bürchner
Chair of Computational Modeling and Simulation, Technical University of Munich, School of Engineering and Design, Arcisstraße 21, Munich, 80333, Germany
L
László Kudela
Chair of Data Engineering in Construction, Bauhaus-Universität Weimar, Coudraystraße 13 b, 99423, Weimar, Germany
Stefan Kollmannsberger
Stefan Kollmannsberger
Bauhaus-Universität Weimar
numerical mechanicsdata driven modelingfrom modeling to analysisadditive manufacturing