Neural Actor-Critic Methods for Hamilton-Jacobi-Bellman PDEs: Asymptotic Analysis and Numerical Studies

📅 2025-07-08
📈 Citations: 0
✨ Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the numerical solution of high-dimensional Hamilton–Jacobi–Bellman (HJB) equations arising in stochastic optimal control. Methodologically, we propose a novel neural network-based actor–critic algorithm: the critic network is designed with an architecture that automatically satisfies boundary conditions, and a bias-gradient technique is introduced to reduce computational cost; the actor update minimizes the integral of the Hamiltonian, circumventing explicit differentiation. Theoretically, we prove that, in the infinite-width limit, the training dynamics converge to an infinite-dimensional ordinary differential equation whose fixed point exactly coincides with the true HJB solution. Experimentally, the method achieves high accuracy and robustness on problems up to 200 dimensions—including cases with non-convex Hamiltonians and linear-quadratic regulators—significantly extending the dimensionality frontier for deep learning–based HJB solvers.

Technology Category

Application Category

📝 Abstract
We mathematically analyze and numerically study an actor-critic machine learning algorithm for solving high-dimensional Hamilton-Jacobi-Bellman (HJB) partial differential equations from stochastic control theory. The architecture of the critic (the estimator for the value function) is structured so that the boundary condition is always perfectly satisfied (rather than being included in the training loss) and utilizes a biased gradient which reduces computational cost. The actor (the estimator for the optimal control) is trained by minimizing the integral of the Hamiltonian over the domain, where the Hamiltonian is estimated using the critic. We show that the training dynamics of the actor and critic neural networks converge in a Sobolev-type space to a certain infinite-dimensional ordinary differential equation (ODE) as the number of hidden units in the actor and critic $ ightarrow infty$. Further, under a convexity-like assumption on the Hamiltonian, we prove that any fixed point of this limit ODE is a solution of the original stochastic control problem. This provides an important guarantee for the algorithm's performance in light of the fact that finite-width neural networks may only converge to a local minimizers (and not optimal solutions) due to the non-convexity of their loss functions. In our numerical studies, we demonstrate that the algorithm can solve stochastic control problems accurately in up to 200 dimensions. In particular, we construct a series of increasingly complex stochastic control problems with known analytic solutions and study the algorithm's numerical performance on them. These problems range from a linear-quadratic regulator equation to highly challenging equations with non-convex Hamiltonians, allowing us to identify and analyze the strengths and limitations of this neural actor-critic method for solving HJB equations.
Problem

Research questions and friction points this paper is trying to address.

Analyzing actor-critic methods for solving high-dimensional HJB equations.
Ensuring boundary conditions and reducing computational costs in critic architecture.
Demonstrating algorithm performance in stochastic control up to 200 dimensions.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Critic architecture enforces exact boundary conditions
Biased gradient reduces computational cost
Actor minimizes Hamiltonian integral using critic
🔎 Similar Papers
No similar papers found.
Samuel N. Cohen
Samuel N. Cohen
Professor of Mathematics, University of Oxford
Stochastic analysismathematical finance
J
Jackson Hebner
Mathematical Institute, University of Oxford
D
Deqing Jiang
Mathematical Institute, University of Oxford
Justin Sirignano
Justin Sirignano
University of Oxford
mathematical financefinancemachine learning