Neural Value Iteration

📅 2025-11-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Scalable planning for large-scale partially observable Markov decision processes (POMDPs) remains hindered by the exponential computational complexity of exact Bellman backups and the limited scalability of traditional offline solvers. Method: This paper introduces, for the first time, a neural parameterization of α-vectors that explicitly encodes the piecewise-linear convex structure of the value function into a differentiable neural architecture, enabling scalable, differentiable value iteration. The approach integrates point-based value iteration principles with neural function approximation while retaining the classical value iteration framework for efficient Bellman backups. Contribution/Results: Experiments demonstrate that the method yields near-optimal policies on ultra-large-scale POMDPs—beyond the reach of conventional offline solvers—achieving substantial gains in computational efficiency and state-space scalability. It effectively overcomes the computational bottleneck inherent in high-dimensional POMDP planning.

Technology Category

Application Category

📝 Abstract
The value function of a POMDP exhibits the piecewise-linear-convex (PWLC) property and can be represented as a finite set of hyperplanes, known as $alpha$-vectors. Most state-of-the-art POMDP solvers (offline planners) follow the point-based value iteration scheme, which performs Bellman backups on $alpha$-vectors at reachable belief points until convergence. However, since each $alpha$-vector is $|S|$-dimensional, these methods quickly become intractable for large-scale problems due to the prohibitive computational cost of Bellman backups. In this work, we demonstrate that the PWLC property allows a POMDP's value function to be alternatively represented as a finite set of neural networks. This insight enables a novel POMDP planning algorithm called emph{Neural Value Iteration}, which combines the generalization capability of neural networks with the classical value iteration framework. Our approach achieves near-optimal solutions even in extremely large POMDPs that are intractable for existing offline solvers.
Problem

Research questions and friction points this paper is trying to address.

Overcoming computational intractability in large-scale POMDP planning problems
Replacing traditional α-vector representations with neural networks
Enabling near-optimal solutions for extremely large POMDPs
Innovation

Methods, ideas, or system contributions that make the work stand out.

Representing value function with neural networks
Combining neural networks with value iteration
Solving large-scale POMDPs with neural representations
🔎 Similar Papers
No similar papers found.