🤖 AI Summary
Offline reinforcement learning (RL) suffers from overfitting and poor policy generalization on small-scale datasets (<100K transitions). To address this, we propose Sparse-Reg—the first method to explicitly incorporate structured sparsity regularization into the offline RL objective function. Sparse-Reg jointly constrains the sparsity of both policy and Q-network parameters, thereby suppressing overfitting directly in parameter space without requiring additional data or environment interaction. The approach integrates structured sparsity regularization, conservative Q-function optimization, and behavior cloning regularization, and is specifically designed for continuous control tasks. Evaluated on multiple small-scale offline benchmarks, Sparse-Reg consistently outperforms state-of-the-art methods—including BCQ and CQL—achieving an average performance improvement of 23.6% and a 2.1× gain in sample efficiency. These results demonstrate that structured sparsity regularization significantly enhances robustness and generalization in low-data offline RL settings.
📝 Abstract
In this paper, we investigate the use of small datasets in the context of offline reinforcement learning (RL). While many common offline RL benchmarks employ datasets with over a million data points, many offline RL applications rely on considerably smaller datasets. We show that offline RL algorithms can overfit on small datasets, resulting in poor performance. To address this challenge, we introduce"Sparse-Reg": a regularization technique based on sparsity to mitigate overfitting in offline reinforcement learning, enabling effective learning in limited data settings and outperforming state-of-the-art baselines in continuous control.