Rollout-Based Approximate Dynamic Programming for MDPs with Information-Theoretic Constraints

📅 2025-09-02

📈 Citations: 0

✨ Influential: 0

career value

173K/year

🤖 AI Summary

This paper addresses the finite-horizon Markov decision problem with information-theoretic constraints, aiming to minimize directed information from the source process to the control process subject to stage-wise cost constraints. To overcome the computational bottleneck arising from explicit discretization of the continuous information state space in conventional approaches, we propose a truncated-rollout-based forward–backward approximate dynamic programming framework that avoids discretization while providing theoretical convergence guarantees. The method integrates Q-factor modeling, offline basis-policy approximation, and online rollout-based lookahead optimization, solving for the optimal control policy efficiently in two phases. Numerical experiments demonstrate that the proposed approach outperforms existing benchmark methods in both control performance (i.e., achieved cost) and computational efficiency (i.e., offline training time and online inference latency).

Technology Category

Application Category

📝 Abstract

This paper studies a finite-horizon Markov decision problem with information-theoretic constraints, where the goal is to minimize directed information from the controlled source process to the control process, subject to stage-wise cost constraints, aiming for an optimal control policy. We propose a new way of approximating a solution for this problem, which is known to be formulated as an unconstrained MDP with a continuous information-state using Q-factors. To avoid the computational complexity of discretizing the continuous information-state space, we propose a truncated rollout-based backward-forward approximate dynamic programming (ADP) framework. Our approach consists of two phases: an offline base policy approximation over a shorter time horizon, followed by an online rollout lookahead minimization, both supported by provable convergence guarantees. We supplement our theoretical results with a numerical example where we demonstrate the cost improvement of the rollout method compared to a previously proposed policy approximation method, and the computational complexity observed in executing the offline and online phases for the two methods.

Problem

Research questions and friction points this paper is trying to address.

Minimizing directed information in MDPs with cost constraints

Solving unconstrained MDP with continuous information-state complexity

Developing rollout-based ADP to avoid state space discretization

Innovation

Methods, ideas, or system contributions that make the work stand out.

Rollout-based approximate dynamic programming framework

Offline base policy and online rollout minimization

Avoids discretization of continuous information-state space

🔎 Similar Papers

A Policy Gradient Approach for Finite Horizon Constrained Markov Decision Processes