Convergence of Neural Network Policies for Risk--Reward Optimization

πŸ“… 2026-03-06
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work addresses the challenge of optimizing two-step feedback policies in multi-period risk–return stochastic control problems involving state discontinuities and constraints. The authors propose an unconstrained parameterization approach based on coupled feedforward neural networks, wherein constrained output layers, auxiliary variables for risk measures (e.g., CVaR), and performance vectors for general objectives collectively reformulate the original problem as a neural network training task. For the first time, they establish a probabilistic convergence theory for neural network approximation of policies with discontinuities and constraints, modularly decoupling policy approximation, recursive propagation, and objective preservation. Numerical experiments confirm theoretical convergence, showing that heatmaps of learned policies closely match reference solutions and exhibit strong robustness in large-scale out-of-sample scenarios.

Technology Category

Application Category

πŸ“ Abstract
We develop a neural-network framework for multi-period risk--reward stochastic control problems with constrained two-step feedback policies that may be discontinuous in the state. We allow a broad class of objectives built on a finite-dimensional performance vector, including terminal and path-dependent statistics, with risk functionals admitting auxiliary-variable optimization representations (e.g.\ Conditional Value-at-Risk and buffered probability of exceedance) and optional moment dependence. Our approach parametrizes the two-step policy using two coupled feedforward networks with constraint-enforcing output layers, reducing the constrained control problem to unconstrained training over network parameters. Under mild regularity conditions, we prove that the empirical optimum of the NN-parametrized objective converges in probability to the true optimal value as network capacity and training sample size increase. The proof is modular, separating policy approximation, propagation through the controlled recursion, and preservation under the scalarized risk--reward objective. Numerical experiments confirm the predicted convergence-in-probability behavior, show close agreement between learned and reference control heat maps, and demonstrate out-of-sample robustness on a large independent scenario set.
Problem

Research questions and friction points this paper is trying to address.

risk-reward optimization
stochastic control
constrained policies
neural networks
convergence
Innovation

Methods, ideas, or system contributions that make the work stand out.

neural network policy
risk-reward optimization
constrained stochastic control
convergence in probability
auxiliary-variable risk representation
πŸ”Ž Similar Papers
No similar papers found.
C
Chang Chen
School of Mathematics and Physics, The University of Queensland, St Lucia, Brisbane 4072, Australia
Duy-Minh Dang
Duy-Minh Dang
Univerisity of Queensland
Scientific Computing - Computational Finance - GPU Parallel Computing