🤖 AI Summary
This paper investigates the optimization of long-run average reward in Markov decision processes (MDPs) under window-based safety constraints—i.e., maximizing the expected average reward over fixed- or bounded-length windows while satisfying sure (deterministic), almost-sure, or probabilistic lower-bound guarantees. It establishes, for the first time, a unified complexity characterization: fixed-window problems are PTIME-complete, whereas bounded-window problems lie in NP ∩ coNP. The paper proves that pure finite-memory strategies are optimal for sure and almost-sure constraints, whereas randomized strategies are inherently necessary under probabilistic constraints. Methodologically, it develops a novel algorithmic framework integrating game-theoretic analysis, stochastic model checking, linear programming, and strategy synthesis—yielding both polynomial-time and pseudo-polynomial algorithms. These contributions provide both theoretical foundations and practical tools for robust average-reward optimization with formal safety guarantees.
📝 Abstract
The window mean-payoff objective strengthens the classical mean-payoff objective by computing the mean-payoff over a finite window that slides along an infinite path. Two variants have been considered: in one variant, the maximum window length is fixed and given, while in the other, it is not fixed but is required to be bounded. In this paper, we look at the problem of synthesising strategies in Markov decision processes that maximise the window mean-payoff value in expectation, while also simultaneously guaranteeing that the value is above a certain threshold. We solve the synthesis problem for three different kinds of guarantees: sure (that needs to be satisfied in the worst-case, that is, for an adversarial environment), almost-sure (that needs to be satisfied with probability one), and probabilistic (that needs to be satisfied with at least some given probability $p$). We show that for fixed window mean-payoff objective, all the three problems are in $mathsf{PTIME}$, while for bounded window mean-payoff objective, they are in $mathsf{NP} cap mathsf{coNP}$, and thus have the same complexity as for maximising the expected performance without any guarantee. Moreover, we show that pure finite-memory strategies suffice for maximising the expectation with sure and almost-sure guarantees, whereas, for maximising expectation with a probabilistic guarantee, randomised strategies are necessary in general.