Fewer May Be Better: Enhancing Offline Reinforcement Learning with Reduced Dataset

📅 2025-02-26

📈 Citations: 0

✨ Influential: 0

career value

224K/year

🤖 AI Summary

To address data redundancy, low training efficiency, and unknown optimal dataset size in offline reinforcement learning, this paper formulates data selection as a submodular optimization problem under gradient approximation. We theoretically establish the submodularity of the Actor-Critic objective function and propose an enhanced Orthogonal Matching Pursuit (OMP) algorithm grounded in this property. Our method requires no auxiliary networks or labeled data, performing efficient data pruning solely via original policy gradient estimates. Evaluated on multiple standard benchmarks, it achieves improved policy performance using only 10%–30% of the original dataset, accelerates training by 2.1–3.8×, and significantly reduces computational overhead. Moreover, it enables estimation of the minimal effective dataset size required for a given task.

Technology Category

Application Category

📝 Abstract

Offline reinforcement learning (RL) represents a significant shift in RL research, allowing agents to learn from pre-collected datasets without further interaction with the environment. A key, yet underexplored, challenge in offline RL is selecting an optimal subset of the offline dataset that enhances both algorithm performance and training efficiency. Reducing dataset size can also reveal the minimal data requirements necessary for solving similar problems. In response to this challenge, we introduce ReDOR (Reduced Datasets for Offline RL), a method that frames dataset selection as a gradient approximation optimization problem. We demonstrate that the widely used actor-critic framework in RL can be reformulated as a submodular optimization objective, enabling efficient subset selection. To achieve this, we adapt orthogonal matching pursuit (OMP), incorporating several novel modifications tailored for offline RL. Our experimental results show that the data subsets identified by ReDOR not only boost algorithm performance but also do so with significantly lower computational complexity.

Problem

Research questions and friction points this paper is trying to address.

Optimizing dataset selection for offline RL

Reducing data requirements for efficient learning

Enhancing performance with lower computational complexity

Innovation

Methods, ideas, or system contributions that make the work stand out.

Reduced dataset selection

Gradient approximation optimization

Orthogonal matching pursuit adaptation

🔎 Similar Papers

Domain Adaptation for Offline Reinforcement Learning with Limited Samples