Finite-memory Strategies for Almost-sure Energy-MeanPayoff Objectives in MDPs

📅 2024-04-22

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

265K/year

🤖 AI Summary

This paper investigates finite-state Markov decision processes (MDPs) with dual objectives: energy constraints and strictly positive long-run average reward. We address the challenge of synthesizing controllers that simultaneously avoid energy exhaustion and guarantee almost-sure satisfaction of a strictly positive average reward. We establish the first proof that finite-memory strategies with exponential memory suffice for almost-sure winning, and we prove this exponential memory bound is tight—resolving an open question by showing that, unlike prior energy-parity objectives requiring infinite memory, this dual objective admits a finite-memory solution. Our approach integrates MDP theory, energy-game modeling, average-reward analysis, and probabilistic verification to devise a pseudo-polynomial-time decision algorithm. Furthermore, we generalize our results to multi-dimensional average rewards, providing the first pseudo-polynomial-time decidability result for this setting.

Technology Category

Application Category

📝 Abstract

We consider finite-state Markov decision processes with the combined Energy-MeanPayoff objective. The controller tries to avoid running out of energy while simultaneously attaining a strictly positive mean payoff in a second dimension. We show that finite memory suffices for almost surely winning strategies for the Energy-MeanPayoff objective. This is in contrast to the closely related Energy-Parity objective, where almost surely winning strategies require infinite memory in general. We show that exponential memory is sufficient (even for deterministic strategies) and necessary (even for randomized strategies) for almost surely winning Energy-MeanPayoff. The upper bound holds even if the strictly positive mean payoff part of the objective is generalized to multidimensional strictly positive mean payoff. Finally, it is decidable in pseudo-polynomial time whether an almost surely winning strategy exists.

Problem

Research questions and friction points this paper is trying to address.

Finite-memory strategies for combined Energy-MeanPayoff objectives

Avoid energy depletion while achieving positive mean payoff

Deciding existence of winning strategies in pseudo-polynomial time

Innovation

Methods, ideas, or system contributions that make the work stand out.

Finite memory suffices for winning strategies

Exponential memory is sufficient and necessary

Decidability in pseudo-polynomial time for strategy existence

🔎 Similar Papers

Convergence and sample complexity of natural policy gradient primal-dual methods for constrained MDPs