🤖 AI Summary
This paper addresses a class of multistage stochastic programming problems characterized by continuous state and action spaces, decision-dependent uncertainty, and limited statistical learning capability. To overcome the expressive limitations of conventional models, we propose an extended policy graph framework that explicitly captures the feedback effect of decisions on uncertainty and incorporates online learning mechanisms. Building upon this, we design a novel stochastic dual dynamic programming (SDDP) algorithm and its nonconvex approximation variant, tailored for efficiently solving such structured Markov decision processes. Experimental results on a suite of benchmark instances—increasing in complexity—demonstrate that our approach significantly improves policy quality and computational scalability. The work establishes a new paradigm for stochastic optimization that jointly integrates statistical learning with sequential decision-making, offering both enhanced expressiveness and tractability.
📝 Abstract
We study a class of multi-stage stochastic programs, which incorporate modeling features from Markov decision processes (MDPs). This class includes structured MDPs with continuous state and action spaces. We extend policy graphs to include decision-dependent uncertainty for one-step transition probabilities as well as a limited form of statistical learning. We focus on the expressiveness of our modeling approach, illustrating ideas with a series of examples of increasing complexity. As a solution method, we develop new variants of stochastic dual dynamic programming, including approximations to handle non-convexities.