Neural Policy Iteration for Stochastic Optimal Control: A Physics-Informed Approach

📅 2025-08-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses stochastic optimal control problems by proposing a physics-informed neural network (PINN)-based policy iteration framework for nonlinear stochastic systems governed by second-order Hamilton–Jacobi–Bellman (HJB) equations. The method parameterizes the value function using a neural network and performs value evaluation via L²-minimization of the linear PDE residual induced by a fixed policy. Crucially, it establishes, for the first time, an explicit Lipschitz-type bound quantifying how value gradient estimation error propagates to policy update error—enhancing both interpretability and training stability. Theoretical analysis preserves the global exponential convergence guarantee of classical policy iteration. Empirical evaluation on benchmark tasks—including stochastic CartPole, a nonlinear pendulum, and a 10-dimensional linear-quadratic regulator—demonstrates the method’s effectiveness and scalability to high-dimensional settings.

Technology Category

Application Category

📝 Abstract
We propose a physics-informed neural network policy iteration (PINN-PI) framework for solving stochastic optimal control problems governed by second-order Hamilton--Jacobi--Bellman (HJB) equations. At each iteration, a neural network is trained to approximate the value function by minimizing the residual of a linear PDE induced by a fixed policy. This linear structure enables systematic $L^2$ error control at each policy evaluation step, and allows us to derive explicit Lipschitz-type bounds that quantify how value gradient errors propagate to the policy updates. This interpretability provides a theoretical basis for evaluating policy quality during training. Our method extends recent deterministic PINN-based approaches to stochastic settings, inheriting the global exponential convergence guarantees of classical policy iteration under mild conditions. We demonstrate the effectiveness of our method on several benchmark problems, including stochastic cartpole, pendulum problems and high-dimensional linear quadratic regulation (LQR) problems in up to 10D.
Problem

Research questions and friction points this paper is trying to address.

Solves stochastic optimal control via physics-informed neural networks
Addresses second-order Hamilton-Jacobi-Bellman equation challenges
Extends deterministic approaches to stochastic settings with convergence guarantees
Innovation

Methods, ideas, or system contributions that make the work stand out.

Physics-informed neural network for stochastic control
Systematic L2 error control in policy evaluation
Explicit Lipschitz bounds for error propagation
🔎 Similar Papers
No similar papers found.
Yeongjong Kim
Yeongjong Kim
POSTECH CM2LA Postdoc
Machine LearningOptimizationReinforcement LearningBanditMarkov Decision Process
Yeoneung Kim
Yeoneung Kim
SeoulTech
mathematicsmachine learning
M
Minseok Kim
Department of Applied Artificial Intelligence, Seoul National University of Science and Technology
N
Namkyeong Cho
Department of Financial Mathematics, Gachon University