AVG-DICE: Stationary Distribution Correction by Regression

📅 2025-03-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address bias and instability in off-policy evaluation (OPE) arising from mismatched stationary state distributions between behavior and target policies, this paper introduces the Average Discounted Importance Sampling (ADIS) density ratio estimator. ADIS directly estimates the discounted state density ratio via Monte Carlo sampling—without requiring inverse Bellman updates or iterative optimization—and supports end-to-end neural regression. It is provably unbiased, consistent, and computationally efficient. On standard OPE benchmarks (e.g., MuJoCo Gym), ADIS matches or exceeds state-of-the-art accuracy, with improvements up to an order of magnitude on certain tasks; its only hyperparameter—the discount factor—enables straightforward tuning. The core contribution is the first formulation of the average discounted importance sampling ratio as a learnable density ratio estimation paradigm, yielding a theoretically sound yet practically lightweight framework for OPE.

Technology Category

Application Category

📝 Abstract
Off-policy policy evaluation (OPE), an essential component of reinforcement learning, has long suffered from stationary state distribution mismatch, undermining both stability and accuracy of OPE estimates. While existing methods correct distribution shifts by estimating density ratios, they often rely on expensive optimization or backward Bellman-based updates and struggle to outperform simpler baselines. We introduce AVG-DICE, a computationally simple Monte Carlo estimator for the density ratio that averages discounted importance sampling ratios, providing an unbiased and consistent correction. AVG-DICE extends naturally to nonlinear function approximation using regression, which we roughly tune and test on OPE tasks based on Mujoco Gym environments and compare with state-of-the-art density-ratio estimators using their reported hyperparameters. In our experiments, AVG-DICE is at least as accurate as state-of-the-art estimators and sometimes offers orders-of-magnitude improvements. However, a sensitivity analysis shows that best-performing hyperparameters may vary substantially across different discount factors, so a re-tuning is suggested.
Problem

Research questions and friction points this paper is trying to address.

Addresses stationary state distribution mismatch in off-policy evaluation.
Introduces AVG-DICE for unbiased density ratio correction via regression.
Compares AVG-DICE with state-of-the-art estimators in Mujoco Gym tasks.
Innovation

Methods, ideas, or system contributions that make the work stand out.

AVG-DICE: Monte Carlo estimator for density ratios
Uses regression for nonlinear function approximation
Outperforms state-of-the-art density-ratio estimators
🔎 Similar Papers
No similar papers found.
Fengdi Che
Fengdi Che
university of alberta
artificial intelligence
Bryan Chan
Bryan Chan
University of Alberta
Reinforcement LearningMachine Learning
C
Chen Ma
Department of Computing Science, University of Alberta, Canada
A
A. R. Mahmood
Department of Computing Science, University of Alberta, Canada; Alberta Machine Intelligence Institute (Amii); CIFAR AI Chair