Multiplicative Rewards in Markovian Models

📅 2025-04-25

📈 Citations: 0

✨ Influential: 0

career value

213K/year

🤖 AI Summary

This paper investigates the computation of expected multiplicative rewards—where rewards are multiplied across transitions—in Markov chains (MCs) and Markov decision processes (MDPs). Unlike additive rewards, multiplicative expectations may diverge to zero or infinity due to transient or recurrent states. The authors first establish necessary and sufficient conditions for such divergence. Under the Common Subsequence Rational Independence (CSRI) assumption, they prove that computing the expected multiplicative reward in MCs is polynomial-time solvable. For MDPs, determining the optimal value is PSPACE-complete in general, yet all variants become polynomial-time solvable when restricted to absorbing MDPs. A refined complexity classification is provided, and the existence of optimal policies is rigorously established. The work bridges probabilistic model checking, algebraic and number-theoretic analysis (CSRI), polynomial-space algorithm design, and the theory of absorbing Markov chains.

Technology Category

Application Category

📝 Abstract

This paper studies the expected value of multiplicative rewards, where rewards obtained in each step are multiplied (instead of the usual addition), in Markov chains (MCs) and Markov decision processes (MDPs). One of the key differences to additive rewards is that the expected value may diverge to infinity not only due to recurrent, but also due to transient states. For MCs, computing the value is shown to be possible in polynomial time given an oracle for the comparison of succinctly represented integers (CSRI), which is only known to be solvable in polynomial time subject to number-theoretic conjectures. Interestingly, distinguishing whether the value is infinite or 0 is at least as hard as CSRI, while determining if it is one of these two can be done in polynomial time. In MDPs, the optimal value can be computed in polynomial space. Further refined complexity results and results on the complexity of optimal schedulers are presented. The techniques developed for MDPs additionally allow to solve the multiplicative variant of the stochastic shortest path problem. Finally, for MCs and MDPs where an absorbing state is reached almost surely, all considered problems are solvable in polynomial time.

Problem

Research questions and friction points this paper is trying to address.

Computing expected multiplicative rewards in Markov chains and decision processes

Analyzing complexity of reward divergence in transient and recurrent states

Solving multiplicative stochastic shortest path problems in MDPs

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multiplicative rewards in Markov chains

Polynomial time computation with CSRI oracle

Polynomial space for MDP optimal value

🔎 Similar Papers

An MRP Formulation for Supervised Learning: Generalized Temporal Difference Learning Models