🤖 AI Summary
Scalable planning for large-scale partially observable Markov decision processes (POMDPs) remains hindered by the exponential computational complexity of exact Bellman backups and the limited scalability of traditional offline solvers.
Method: This paper introduces, for the first time, a neural parameterization of α-vectors that explicitly encodes the piecewise-linear convex structure of the value function into a differentiable neural architecture, enabling scalable, differentiable value iteration. The approach integrates point-based value iteration principles with neural function approximation while retaining the classical value iteration framework for efficient Bellman backups.
Contribution/Results: Experiments demonstrate that the method yields near-optimal policies on ultra-large-scale POMDPs—beyond the reach of conventional offline solvers—achieving substantial gains in computational efficiency and state-space scalability. It effectively overcomes the computational bottleneck inherent in high-dimensional POMDP planning.
📝 Abstract
The value function of a POMDP exhibits the piecewise-linear-convex (PWLC) property and can be represented as a finite set of hyperplanes, known as $alpha$-vectors. Most state-of-the-art POMDP solvers (offline planners) follow the point-based value iteration scheme, which performs Bellman backups on $alpha$-vectors at reachable belief points until convergence. However, since each $alpha$-vector is $|S|$-dimensional, these methods quickly become intractable for large-scale problems due to the prohibitive computational cost of Bellman backups. In this work, we demonstrate that the PWLC property allows a POMDP's value function to be alternatively represented as a finite set of neural networks. This insight enables a novel POMDP planning algorithm called emph{Neural Value Iteration}, which combines the generalization capability of neural networks with the classical value iteration framework. Our approach achieves near-optimal solutions even in extremely large POMDPs that are intractable for existing offline solvers.