Sample Efficient Experience Replay in Non-stationary Environments

πŸ“… 2025-09-18
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
In non-stationary environments, rapid environmental dynamics cause historical experiences to become obsolete quickly, while conventional TD-error-based prioritized replay cannot distinguish between errors arising from policy updates and those induced by environmental shifts, thereby limiting learning efficiency. To address this, we propose DEERβ€”a Dynamic Environment-adaptive Experience Replay framework. First, we formalize the Degree of Environment Change (DoE) to quantify environmental dynamics. Second, we design a classifier-based adaptive sampling mechanism that dynamically reweights experience priorities upon detecting environmental switches. Third, we integrate value-function discrepancy modeling with off-policy optimization to enable precise control over experience reuse. Evaluated on four standard non-stationary benchmarks, DEER achieves an average performance gain of 11.54% over the strongest baselines, significantly improving both sample efficiency and environmental adaptability.

Technology Category

Application Category

πŸ“ Abstract
Reinforcement learning (RL) in non-stationary environments is challenging, as changing dynamics and rewards quickly make past experiences outdated. Traditional experience replay (ER) methods, especially those using TD-error prioritization, struggle to distinguish between changes caused by the agent's policy and those from the environment, resulting in inefficient learning under dynamic conditions. To address this challenge, we propose the Discrepancy of Environment Dynamics (DoE), a metric that isolates the effects of environment shifts on value functions. Building on this, we introduce Discrepancy of Environment Prioritized Experience Replay (DEER), an adaptive ER framework that prioritizes transitions based on both policy updates and environmental changes. DEER uses a binary classifier to detect environment changes and applies distinct prioritization strategies before and after each shift, enabling more sample-efficient learning. Experiments on four non-stationary benchmarks demonstrate that DEER further improves the performance of off-policy algorithms by 11.54 percent compared to the best-performing state-of-the-art ER methods.
Problem

Research questions and friction points this paper is trying to address.

Addresses reinforcement learning challenges in non-stationary environments
Proposes metric to isolate environment shift effects on value functions
Develops adaptive experience replay framework for dynamic conditions
Innovation

Methods, ideas, or system contributions that make the work stand out.

DEER prioritizes transitions using environment-policy changes
Uses binary classifier to detect environmental shifts
Applies distinct strategies before and after changes
πŸ”Ž Similar Papers
No similar papers found.
T
Tianyang Duan
Department of Computer Science, The University of Hong Kong, Hong Kong, China
Z
Zongyuan Zhang
Department of Computer Science, The University of Hong Kong, Hong Kong, China
S
Songxiao Guo
Department of Computer Science, The University of Hong Kong, Hong Kong, China
Y
Yuanye Zhao
College of International Education, Hebei University of Economics and Business, China
Z
Zheng Lin
Department of Electrical and Electronic Engineering, The University of Hong Kong, Hong Kong, China
Z
Zihan Fang
Department of Computer Science, City University of Hong Kong, Hong Kong, China
Y
Yi Liu
Department of Computer Science, City University of Hong Kong, Hong Kong, China
D
Dianxin Luan
Institute for Imaging, Data and Communications, University of Edinburgh, UK
D
Dong Huang
School of Computing, National University of Singapore, Singapore
Heming Cui
Heming Cui
University of Hong Kong
Operating SystemsProgramming LanguageDistributed SystemsSecurity
Yong Cui
Yong Cui
Professor of Computer Science, Tsinghua University
Network ArchitectureMobile Computing