Deep neural networks can provably solve Bellman equations for Markov decision processes without the curse of dimensionality

📅 2025-06-28

📈 Citations: 0

✨ Influential: 0

career value

197K/year

🤖 AI Summary

To address the curse of dimensionality in solving the Bellman equation for high-dimensional Markov decision processes (MDPs), this paper investigates infinite-horizon stochastic optimal control with finite control sets. We propose a deep neural network-based approximation of the Q-function. Our method innovatively employs deep networks with leaky ReLU activations, integrated within a full-history recursive multi-layer fixed-point (MLFP) approximation framework, enabling joint modeling of the reward function and state transition dynamics. Crucially, under appropriate regularity conditions, we establish—for the first time—that the Q-function can be approximated in the L²-norm with polynomial complexity: the number of network parameters grows only polynomially in both the state dimension and ε⁻¹, where ε is the approximation error tolerance. This theoretical guarantee provides a foundational justification for deep reinforcement learning and overcomes long-standing computational bottlenecks in high-dimensional control problems.

Technology Category

Application Category

📝 Abstract

Discrete time stochastic optimal control problems and Markov decision processes (MDPs) are fundamental models for sequential decision-making under uncertainty and as such provide the mathematical framework underlying reinforcement learning theory. A central tool for solving MDPs is the Bellman equation and its solution, the so-called $Q$-function. In this article, we construct deep neural network (DNN) approximations for $Q$-functions associated to MDPs with infinite time horizon and finite control set $A$. More specifically, we show that if the the payoff function and the random transition dynamics of the MDP can be suitably approximated by DNNs with leaky rectified linear unit (ReLU) activation, then the solutions $Q_dcolon mathbb R^d o mathbb R^{|A|}$, $din mathbb{N}$, of the associated Bellman equations can also be approximated in the $L^2$-sense by DNNs with leaky ReLU activation whose numbers of parameters grow at most polynomially in both the dimension $din mathbb{N}$ of the state space and the reciprocal $1/varepsilon$ of the prescribed error $varepsilonin (0,1)$. Our proof relies on the recently introduced full-history recursive multilevel fixed-point (MLFP) approximation scheme.

Problem

Research questions and friction points this paper is trying to address.

Approximating Q-functions for MDPs using deep neural networks

Overcoming curse of dimensionality in Bellman equation solutions

Polynomial parameter growth in state space dimension and error

Innovation

Methods, ideas, or system contributions that make the work stand out.

Deep neural networks solve Bellman equations efficiently

Leaky ReLU activation approximates Q-functions accurately

Polynomial parameter growth ensures scalability

🔎 Similar Papers

No similar papers found.