Dual Natural Gradient Descent for Scalable Training of Physics-Informed Neural Networks

📅 2025-05-27

📈 Citations: 0

✨ Influential: 0

career value

234K/year

🤖 AI Summary

To address the O(n³) time complexity bottleneck in second-order natural gradient training of Physics-Informed Neural Networks (PINNs), arising from Gaussian-Newton updates in the high-dimensional parameter space, this work proposes Dual Natural Gradient Descent (D-NGD). D-NGD shifts natural gradient updates to a low-dimensional residual space and integrates geodesic acceleration correction with a Nyström-preconditioned conjugate gradient solver. This is the first method enabling efficient natural gradient training of PINNs with up to 12.8 million parameters on a single GPU. Experiments demonstrate that D-NGD reduces L² error by one to three orders of magnitude compared to Adam, SGD, and quasi-Newton methods. It thus substantially overcomes computational and scalability barriers for higher-order optimization in large-scale PINNs.

Technology Category

Application Category

📝 Abstract

Natural-gradient methods markedly accelerate the training of Physics-Informed Neural Networks (PINNs), yet their Gauss--Newton update must be solved in the parameter space, incurring a prohibitive $O(n^3)$ time complexity, where $n$ is the number of network trainable weights. We show that exactly the same step can instead be formulated in a generally smaller residual space of size $m = sum_{gamma} N_{gamma} d_{gamma}$, where each residual class $gamma$ (e.g. PDE interior, boundary, initial data) contributes $N_{gamma}$ collocation points of output dimension $d_{gamma}$. Building on this insight, we introduce extit{Dual Natural Gradient Descent} (D-NGD). D-NGD computes the Gauss--Newton step in residual space, augments it with a geodesic-acceleration correction at negligible extra cost, and provides both a dense direct solver for modest $m$ and a Nystrom-preconditioned conjugate-gradient solver for larger $m$. Experimentally, D-NGD scales second-order PINN optimization to networks with up to 12.8 million parameters, delivers one- to three-order-of-magnitude lower final error $L^2$ than first-order methods (Adam, SGD) and quasi-Newton methods, and -- crucially -- enables natural-gradient training of PINNs at this scale on a single GPU.

Problem

Research questions and friction points this paper is trying to address.

Reducing O(n^3) complexity in PINN training

Enabling scalable second-order optimization for large PINNs

Improving accuracy over first-order methods significantly

Innovation

Methods, ideas, or system contributions that make the work stand out.

D-NGD computes Gauss-Newton step in residual space

Augments with geodesic-acceleration correction efficiently

Enables scalable PINN training on single GPU

🔎 Similar Papers

Convergence of Implicit Gradient Descent for Training Two-Layer Physics-Informed Neural Networks