Off-policy Evaluation with Deeply-abstracted States

📅 2024-06-27

📈 Citations: 0

✨ Influential: 0

career value

204K/year

🤖 AI Summary

To address the high sample complexity and estimation inaccuracy of offline policy evaluation (OPE) in high-dimensional state spaces, this paper introduces state abstraction into the OPE framework for the first time. We propose a backward model-free abstraction condition tailored to OPE and construct a time-reversed Markov decision process (MDP) based on it. Building upon this, we design an iterative deep abstraction algorithm that guarantees Fisher consistency of mainstream OPE estimators—such as (marginal) importance sampling—in the abstracted state space. Theoretically, our method substantially reduces the sample complexity of OPE, and the abstraction procedure is agnostic to both the target policy and the environment dynamics. Our core contributions are threefold: (i) establishing the first theoretical foundation for OPE-oriented state abstraction; (ii) introducing a verifiable backward abstraction condition; and (iii) unifying statistical consistency with computational efficiency in OPE.

Technology Category

Application Category

📝 Abstract

Off-policy evaluation (OPE) is crucial for assessing a target policy's impact offline before its deployment. However, achieving accurate OPE in large state spaces remains challenging. This paper studies state abstractions -- originally designed for policy learning -- in the context of OPE. Our contributions are three-fold: (i) We define a set of irrelevance conditions central to learning state abstractions for OPE, and derive a backward-model-irrelevance condition for achieving irrelevance in %sequential and (marginalized) importance sampling ratios by constructing a time-reversed Markov decision process (MDP). (ii) We propose a novel iterative procedure that sequentially projects the original state space into a smaller space, resulting in a deeply-abstracted state, which substantially simplifies the sample complexity of OPE arising from high cardinality. (iii) We prove the Fisher consistencies of various OPE estimators when applied to our proposed abstract state spaces.

Problem

Research questions and friction points this paper is trying to address.

Addresses challenges in off-policy evaluation in large state spaces.

Proposes state abstractions to simplify sample complexity in OPE.

Ensures Fisher consistency of OPE estimators in abstract state spaces.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Defines irrelevance conditions for state abstractions

Proposes iterative state space projection method

Proves Fisher consistencies of OPE estimators

🔎 Similar Papers

No similar papers found.