🤖 AI Summary
This work addresses the synthesis of safe policies for multi-objective interval Markov decision processes (IMDPs) under transition uncertainty. By modeling value iteration as a switched affine system with interval uncertainties, the authors introduce— for the first time—polyhedral Lyapunov functions to construct attractive invariant sets, leveraging control-theoretic stability principles to directly generate policies that provably converge to the target set. This approach circumvents the conservatism inherent in traditional quadratic Lyapunov functions and eliminates the need for explicit Pareto front computation, thereby significantly enhancing scalability. The method is validated on case studies including robotic recycling and electric vehicle battery management, demonstrating that the synthesized policies maintain rigorous convergence guarantees despite transition uncertainties.
📝 Abstract
Decision-making under uncertainty is central to many safety-critical applications, where decisions must be guided by probabilistic modeling formalisms. This paper introduces a novel approach to policy synthesis in multi-objective interval Markov decision processes using polyhedral Lyapunov functions. Unlike previous Lyapunov-based methods that mainly rely on quadratic functions, our method utilizes polyhedral functions to enhance accuracy in managing uncertainties within value iteration of dynamic programming. We reformulate the value iteration algorithm as a switched affine system with interval uncertainties and apply control-theoretic stability principles to synthesize policies that guide the system toward a desired target set. By constructing an invariant set of attraction, we ensure that the synthesized policies provide convergence guarantees while minimizing the impact of transition uncertainty in the underlying model. Our methodology removes the need for computationally intensive Pareto curve computations by directly determining a policy that brings objectives within a specified range of their target values. We validate our approach through numerical case studies, including a recycling robot and an electric vehicle battery, demonstrating its effectiveness in achieving policy synthesis under uncertainty.