Mixing Any Cocktail with Limited Ingredients: On the Structure of Payoff Sets in Multi-Objective MDPs and its Impact on Randomised Strategies

📅 2025-02-25

📈 Citations: 0

✨ Influential: 0

career value

195K/year

🤖 AI Summary

This work investigates the reachability of expected reward vectors in multi-objective Markov decision processes (MDPs), particularly addressing the role of randomized policies when pure policies are insufficient. Using tools from convex analysis, probability theory, and MDP theory, the paper establishes that—under any well-defined multidimensional reward structure—every feasible expected reward vector can be approximated to arbitrary precision via a finite convex combination of pure-policy reward vectors; moreover, in the finite-expectation setting, all feasible reward vectors are exactly attainable. The study rigorously characterizes the convex compactness of the payoff set, precisely quantifies the necessity of randomization, and determines the minimal number of pure policies required for such mixtures—reducing policy complexity from infinite to finite mixtures. These results provide foundational theoretical guarantees for designing approximation algorithms in multi-objective MDPs.

Technology Category

Application Category

📝 Abstract

We consider multi-dimensional payoff functions in Markov decision processes, and ask whether a given expected payoff vector can be achieved or not. In general, pure strategies (i.e., not resorting to randomisation) do not suffice for this problem. We study the structure of the set of expected payoff vectors of all strategies given a multi-dimensional payoff function and its consequences regarding randomisation requirements for strategies. In particular, we prove that for any payoff for which the expectation is well-defined under all strategies, it is sufficient to mix (i.e., randomly select a pure strategy at the start of a play and committing to it for the rest of the play) finitely many pure strategies to approximate any expected payoff vector up to any precision. Furthermore, for any payoff for which the expected payoff is finite under all strategies, any expected payoff can be obtained exactly by mixing finitely many strategies.

Problem

Research questions and friction points this paper is trying to address.

Achieving expected payoff vectors

Structure of payoff sets

Randomised strategies in MDPs

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-dimensional payoff functions

Randomised strategies mixing

Finitely many pure strategies

🔎 Similar Papers

Divide and Conquer: Provably Unveiling the Pareto Front with Multi-Objective Reinforcement Learning