Markov Decision Processes under External Temporal Processes

📅 2023-05-25

📈 Citations: 0

✨ Influential: 0

career value

217K/year

🤖 AI Summary

This work addresses non-stationary environments in real-world settings, where exogenous temporal events continuously induce dynamics—challenging the standard stationary MDP assumption in reinforcement learning. We formally introduce the *exogenously driven non-stationary MDP* (ED-NMDP), a novel framework that explicitly models environmental non-stationarity as governed by an exogenous temporal process. Within this framework, we propose a history-dependent policy conditioned on both agent states and the observed exogenous event sequence, and design a tailored policy iteration algorithm. Theoretically, we prove convergence of the algorithm under non-stationarity and derive a tight sample complexity upper bound that explicitly quantifies the impact of exogenous process intensity and memory decay. Empirically, our method demonstrates significantly improved policy robustness and adaptability on benchmark continuous-control tasks. Our core contribution is the establishment of a rigorous, analyzable, and verifiable theoretical foundation for exogenously driven non-stationary RL.

📝 Abstract

Most reinforcement learning algorithms treat the context under which they operate as a stationary, isolated, and undisturbed environment. However, in real world applications, environments constantly change due to a variety of external events. To address this problem, we study Markov Decision Processes (MDP) under the influence of an external temporal process. First, we formalize this notion and derive conditions under which the problem becomes tractable with suitable solutions. We propose a policy iteration algorithm to solve this problem and theoretically analyze its performance. Our analysis addresses the non-stationarity present in the MDP as a result of non-Markovian events, necessitating the formulation of policies that are contingent upon both the current state and a history of prior events. Additionally, we derive insights regarding the sample complexity of the algorithm and incorporate factors that define the exogenous temporal process into the established bounds. Finally, we perform experiments to demonstrate our findings within a traditional control environment.

Problem

Research questions and friction points this paper is trying to address.

Addresses nonstationary Markov Decision Processes influenced by external temporal events

Establishes tractability conditions for problems with finite event history consideration

Develops policy iteration algorithms accounting for environment state and event history

Innovation

Methods, ideas, or system contributions that make the work stand out.

MDPs with external temporal process influence

Policy iteration using finite event history

Sample complexity analysis for approximation methods

🔎 Similar Papers

An MRP Formulation for Supervised Learning: Generalized Temporal Difference Learning Models