๐ค AI Summary
This paper investigates finite-state Markov decision processes (MDPs) with dual objectives: energy constraints and strictly positive long-run average reward. We address the challenge of synthesizing controllers that simultaneously avoid energy exhaustion and guarantee almost-sure satisfaction of a strictly positive average reward. We establish the first proof that finite-memory strategies with exponential memory suffice for almost-sure winning, and we prove this exponential memory bound is tightโresolving an open question by showing that, unlike prior energy-parity objectives requiring infinite memory, this dual objective admits a finite-memory solution. Our approach integrates MDP theory, energy-game modeling, average-reward analysis, and probabilistic verification to devise a pseudo-polynomial-time decision algorithm. Furthermore, we generalize our results to multi-dimensional average rewards, providing the first pseudo-polynomial-time decidability result for this setting.
๐ Abstract
We consider finite-state Markov decision processes with the combined Energy-MeanPayoff objective. The controller tries to avoid running out of energy while simultaneously attaining a strictly positive mean payoff in a second dimension. We show that finite memory suffices for almost surely winning strategies for the Energy-MeanPayoff objective. This is in contrast to the closely related Energy-Parity objective, where almost surely winning strategies require infinite memory in general. We show that exponential memory is sufficient (even for deterministic strategies) and necessary (even for randomized strategies) for almost surely winning Energy-MeanPayoff. The upper bound holds even if the strictly positive mean payoff part of the objective is generalized to multidimensional strictly positive mean payoff. Finally, it is decidable in pseudo-polynomial time whether an almost surely winning strategy exists.