🤖 AI Summary
To address the limited experience diversity and low sampling efficiency of conventional experience replay in high-dimensional, complex environments—such as real-world robotic manipulation and 3D indoor navigation—this paper proposes a novel experience replay framework based on Determinantal Point Processes (DPPs). We are the first to formulate DPPs for quantifying experience diversity and performing diversity-aware prioritization. To ensure scalability, we employ Cholesky decomposition to accelerate kernel matrix computation, and integrate rejection sampling to enable efficient, unbiased sampling in high-dimensional state spaces. Evaluated on MuJoCo continuous control benchmarks, Atari discrete games, and Habitat-based realistic indoor navigation tasks, our method significantly improves sample efficiency and final policy performance. Empirical results demonstrate strong generalization across diverse domains and practical applicability to real-world embodied AI challenges.
📝 Abstract
Experience replay is widely used to improve learning efficiency in reinforcement learning by leveraging past experiences. However, existing experience replay methods, whether based on uniform or prioritized sampling, often suffer from low efficiency, particularly in real-world scenarios with high-dimensional state spaces. To address this limitation, we propose a novel approach, Efficient Diversity-based Experience Replay (EDER). EDER employs a deterministic point process to model the diversity between samples and prioritizes replay based on the diversity between samples. To further enhance learning efficiency, we incorporate Cholesky decomposition for handling large state spaces in realistic environments. Additionally, rejection sampling is applied to select samples with higher diversity, thereby improving overall learning efficacy. Extensive experiments are conducted on robotic manipulation tasks in MuJoCo, Atari games, and realistic indoor environments in Habitat. The results demonstrate that our approach not only significantly improves learning efficiency but also achieves superior performance in high-dimensional, realistic environments.