🤖 AI Summary
This work addresses the problem of computing the complete Pareto front for average-cost multi-objective Markov decision processes (MOMDPs). By leveraging convex geometry and structural analysis of policies, it rigorously establishes that the Pareto front forms a continuous piecewise-linear surface on the boundary of a convex polyhedron, with each vertex uniquely corresponding to a deterministic policy. The study further reveals that policies associated with adjacent vertices differ only in a single state and provides a closed-form expression for the mixing coefficients along the edges connecting them. Notably, the entire Pareto front can be constructed without explicitly solving any MDP. In the context of remote state estimation, all Pareto-optimal policies are shown to be threshold-based, enabling direct derivation of optimal solutions even for certain non-convex MDPs.
📝 Abstract
Many communication and control problems are cast as multi-objective Markov decision processes (MOMDPs). The complete solution to an MOMDP is the Pareto front. Much of the literature approximates this front via scalarization into single-objective MDPs. Recent work has begun to characterize the full front in discounted or simple bi-objective settings by exploiting its geometry. In this work, we characterize the exact front in average-cost MOMDPs. We show that the front is a continuous, piecewise-linear surface lying on the boundary of a convex polytope. Each vertex corresponds to a deterministic policy, and adjacent vertices differ in exactly one state. Each edge is realized as a convex combination of the policies at its endpoints, with the mixing coefficient given in closed form. We apply these results to a remote state estimation problem, where each vertex on the front corresponds to a threshold policy. The exact Pareto front and solutions to certain non-convex MDPs can be obtained without explicitly solving any MDP.