Markov Decision Processes under External Temporal Processes

📅 2023-05-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses non-stationary environments in real-world settings, where exogenous temporal events continuously induce dynamics—challenging the standard stationary MDP assumption in reinforcement learning. We formally introduce the *exogenously driven non-stationary MDP* (ED-NMDP), a novel framework that explicitly models environmental non-stationarity as governed by an exogenous temporal process. Within this framework, we propose a history-dependent policy conditioned on both agent states and the observed exogenous event sequence, and design a tailored policy iteration algorithm. Theoretically, we prove convergence of the algorithm under non-stationarity and derive a tight sample complexity upper bound that explicitly quantifies the impact of exogenous process intensity and memory decay. Empirically, our method demonstrates significantly improved policy robustness and adaptability on benchmark continuous-control tasks. Our core contribution is the establishment of a rigorous, analyzable, and verifiable theoretical foundation for exogenously driven non-stationary RL.
📝 Abstract
Most reinforcement learning algorithms treat the context under which they operate as a stationary, isolated, and undisturbed environment. However, in real world applications, environments constantly change due to a variety of external events. To address this problem, we study Markov Decision Processes (MDP) under the influence of an external temporal process. First, we formalize this notion and derive conditions under which the problem becomes tractable with suitable solutions. We propose a policy iteration algorithm to solve this problem and theoretically analyze its performance. Our analysis addresses the non-stationarity present in the MDP as a result of non-Markovian events, necessitating the formulation of policies that are contingent upon both the current state and a history of prior events. Additionally, we derive insights regarding the sample complexity of the algorithm and incorporate factors that define the exogenous temporal process into the established bounds. Finally, we perform experiments to demonstrate our findings within a traditional control environment.
Problem

Research questions and friction points this paper is trying to address.

Addresses nonstationary Markov Decision Processes influenced by external temporal events
Establishes tractability conditions for problems with finite event history consideration
Develops policy iteration algorithms accounting for environment state and event history
Innovation

Methods, ideas, or system contributions that make the work stand out.

MDPs with external temporal process influence
Policy iteration using finite event history
Sample complexity analysis for approximation methods
🔎 Similar Papers
No similar papers found.