Global Convergence of Adjoint-Optimized Neural PDEs

📅 2025-06-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work investigates the modeling of nonlinear parabolic PDEs with neural-network-based source terms, focusing on the global convergence of gradient descent training as both the hidden-layer width and training time tend to infinity. Addressing two key mathematical challenges—the absence of a spectral gap in the nonlocal neural network integral operator arising in the limiting dynamics, and the inherent nonconvexity of the limiting PDE system—we establish, for the first time, a rigorous proof that the optimization converges to a global minimizer (i.e., the target data solution). Our approach integrates adjoint-PDE-based gradient computation, infinite-width neural network mean-field dynamics, nonlinear PDE theory, and variational spectral analysis. The theoretical result provides the first global convergence guarantee for such neural PDE models. Numerical experiments corroborate the predicted convergence rate and enhanced accuracy, thereby laying a rigorous foundation for reliable and interpretable training of physics-informed neural networks.

Technology Category

Application Category

📝 Abstract
Many engineering and scientific fields have recently become interested in modeling terms in partial differential equations (PDEs) with neural networks. The resulting neural-network PDE model, being a function of the neural network parameters, can be calibrated to available data by optimizing over the PDE using gradient descent, where the gradient is evaluated in a computationally efficient manner by solving an adjoint PDE. These neural-network PDE models have emerged as an important research area in scientific machine learning. In this paper, we study the convergence of the adjoint gradient descent optimization method for training neural-network PDE models in the limit where both the number of hidden units and the training time tend to infinity. Specifically, for a general class of nonlinear parabolic PDEs with a neural network embedded in the source term, we prove convergence of the trained neural-network PDE solution to the target data (i.e., a global minimizer). The global convergence proof poses a unique mathematical challenge that is not encountered in finite-dimensional neural network convergence analyses due to (1) the neural network training dynamics involving a non-local neural network kernel operator in the infinite-width hidden layer limit where the kernel lacks a spectral gap for its eigenvalues and (2) the nonlinearity of the limit PDE system, which leads to a non-convex optimization problem, even in the infinite-width hidden layer limit (unlike in typical neual network training cases where the optimization problem becomes convex in the large neuron limit). The theoretical results are illustrated and empirically validated by numerical studies.
Problem

Research questions and friction points this paper is trying to address.

Convergence of adjoint gradient descent for neural PDEs
Training neural-network PDE models with infinite units
Global minimizer proof for nonlinear parabolic PDEs
Innovation

Methods, ideas, or system contributions that make the work stand out.

Adjoint gradient descent optimizes neural PDEs
Global convergence proven for infinite-width networks
Non-local kernel operator handles nonlinear PDEs
🔎 Similar Papers
No similar papers found.