Off-policy Evaluation with Deeply-abstracted States

📅 2024-06-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the high sample complexity and estimation inaccuracy of offline policy evaluation (OPE) in high-dimensional state spaces, this paper introduces state abstraction into the OPE framework for the first time. We propose a backward model-free abstraction condition tailored to OPE and construct a time-reversed Markov decision process (MDP) based on it. Building upon this, we design an iterative deep abstraction algorithm that guarantees Fisher consistency of mainstream OPE estimators—such as (marginal) importance sampling—in the abstracted state space. Theoretically, our method substantially reduces the sample complexity of OPE, and the abstraction procedure is agnostic to both the target policy and the environment dynamics. Our core contributions are threefold: (i) establishing the first theoretical foundation for OPE-oriented state abstraction; (ii) introducing a verifiable backward abstraction condition; and (iii) unifying statistical consistency with computational efficiency in OPE.

Technology Category

Application Category

📝 Abstract
Off-policy evaluation (OPE) is crucial for assessing a target policy's impact offline before its deployment. However, achieving accurate OPE in large state spaces remains challenging. This paper studies state abstractions -- originally designed for policy learning -- in the context of OPE. Our contributions are three-fold: (i) We define a set of irrelevance conditions central to learning state abstractions for OPE, and derive a backward-model-irrelevance condition for achieving irrelevance in %sequential and (marginalized) importance sampling ratios by constructing a time-reversed Markov decision process (MDP). (ii) We propose a novel iterative procedure that sequentially projects the original state space into a smaller space, resulting in a deeply-abstracted state, which substantially simplifies the sample complexity of OPE arising from high cardinality. (iii) We prove the Fisher consistencies of various OPE estimators when applied to our proposed abstract state spaces.
Problem

Research questions and friction points this paper is trying to address.

Addresses challenges in off-policy evaluation in large state spaces.
Proposes state abstractions to simplify sample complexity in OPE.
Ensures Fisher consistency of OPE estimators in abstract state spaces.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Defines irrelevance conditions for state abstractions
Proposes iterative state space projection method
Proves Fisher consistencies of OPE estimators
🔎 Similar Papers
No similar papers found.
M
Meiling Hao
School of Statistics, University of International Business and Economics, Beijing, 100029, China
P
Pingfan Su
Department of Statistics, London School of Economics and Political Science, London, WC2A2AE, United Kingdom
L
Liyuan Hu
Department of Statistics, London School of Economics and Political Science, London, WC2A2AE, United Kingdom
Zoltan Szabo
Zoltan Szabo
Department of Statistics, London School of Economics and Political Science, London, WC2A2AE, United Kingdom
Qingyuan Zhao
Qingyuan Zhao
University of Cambridge
StatisticsCausal InferenceSelective Inference
Chengchun Shi
Chengchun Shi
London School of Economics and Political Science
Large Language ModelsReinforcement LearningStatistics