Risk-aware Markov Decision Processes Using Cumulative Prospect Theory

📅 2025-05-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the absence of models for human irrational risk preferences in sequential decision-making. We systematically integrate Cumulative Prospect Theory (CPT) into Markov chains (MCs) and Markov decision processes (MDPs), establishing the first risk-aware sequential decision framework. Theoretically, we formalize the CPT-MDP model, prove the necessity and sufficiency of history-dependent randomized policies, and uncover an intrinsic connection between CPT optimization and multi-objective probabilistic reachability. Algorithmically, we design a polynomial-time exact solver for MCs and an EXPTIME algorithm for infinite-horizon CPT-MDPs, establishing fixed-parameter tractability. Our results fill a fundamental theoretical gap in the computational complexity analysis of CPT-based dynamic decision-making and provide a verifiable, computationally grounded modeling foundation for risk-sensitive AI.

Technology Category

Application Category

📝 Abstract
Cumulative prospect theory (CPT) is the first theory for decision-making under uncertainty that combines full theoretical soundness and empirically realistic features [P.P. Wakker - Prospect theory: For risk and ambiguity, Page 2]. While CPT was originally considered in one-shot settings for risk-aware decision-making, we consider CPT in sequential decision-making. The most fundamental and well-studied models for sequential decision-making are Markov chains (MCs), and their generalization Markov decision processes (MDPs). The complexity theoretic study of MCs and MDPs with CPT is a fundamental problem that has not been addressed in the literature. Our contributions are as follows: First, we present an alternative viewpoint for the CPT-value of MCs and MDPs. This allows us to establish a connection with multi-objective reachability analysis and conclude the strategy complexity result that memoryless randomized strategies are necessary and sufficient for optimality. Second, based on this connection, we provide an algorithm for computing the CPT-value in MDPs with infinite-horizon objectives. We show that the problem is in EXPTIME and fixed-parameter tractable. Moreover, we provide a polynomial-time algorithm for the special case of MCs.
Problem

Research questions and friction points this paper is trying to address.

Extends Cumulative Prospect Theory to sequential decision-making in MDPs
Analyzes complexity of CPT in Markov chains and decision processes
Develops algorithms for CPT-value computation in infinite-horizon MDPs
Innovation

Methods, ideas, or system contributions that make the work stand out.

Combines CPT with MDPs for sequential decisions
Links CPT-value to multi-objective reachability analysis
Provides EXPTIME algorithm for CPT-value computation
🔎 Similar Papers
No similar papers found.