Unrolling Dynamic Programming via Graph Filters

📅 2025-07-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the high computational cost of dynamic programming (DP) in large-scale state-action spaces and problems with long-term dependencies, this paper proposes BellNet: a novel framework that models policy iteration as a cascade of learnable nonlinear graph filters, thereby reformulating value function iteration from a graph signal processing perspective for the first time. Grounded in Markov decision processes, BellNet employs differentiable graph filters to enable end-to-end training, optimizing parameters by minimizing the Bellman error. Its key contributions include learnability, cross-task transferability, and controllable computational complexity—leading to a substantial reduction in required iterations. Empirical evaluation on grid-world environments demonstrates that BellNet achieves near-optimal policies using only a small fraction of the iterations required by classical DP algorithms, while significantly improving inference efficiency.

Technology Category

Application Category

📝 Abstract
Dynamic programming (DP) is a fundamental tool used across many engineering fields. The main goal of DP is to solve Bellman's optimality equations for a given Markov decision process (MDP). Standard methods like policy iteration exploit the fixed-point nature of these equations to solve them iteratively. However, these algorithms can be computationally expensive when the state-action space is large or when the problem involves long-term dependencies. Here we propose a new approach that unrolls and truncates policy iterations into a learnable parametric model dubbed BellNet, which we train to minimize the so-termed Bellman error from random value function initializations. Viewing the transition probability matrix of the MDP as the adjacency of a weighted directed graph, we draw insights from graph signal processing to interpret (and compactly re-parameterize) BellNet as a cascade of nonlinear graph filters. This fresh look facilitates a concise, transferable, and unifying representation of policy and value iteration, with an explicit handle on complexity during inference. Preliminary experiments conducted in a grid-like environment demonstrate that BellNet can effectively approximate optimal policies in a fraction of the iterations required by classical methods.
Problem

Research questions and friction points this paper is trying to address.

Solving Bellman's equations efficiently for large MDPs
Reducing computational cost of dynamic programming methods
Approximating optimal policies with fewer iterations
Innovation

Methods, ideas, or system contributions that make the work stand out.

Unrolls policy iterations into learnable BellNet
Reparameterizes BellNet as nonlinear graph filters
Reduces complexity with transferable unified representation
🔎 Similar Papers
No similar papers found.