🤖 AI Summary
Offline reinforcement learning (RL) faces two key challenges: scarcity of high-quality offline datasets and poor generalization of learned policies. To address these, this work introduces data distillation—the first such application in offline RL—proposing a policy-oriented, environment-free offline data distillation framework. Our method integrates gradient matching, behavior cloning optimization, and meta-learning principles within a differentiable data synthesis pipeline to extract a compact, high-information-density distilled dataset from raw offline data. Evaluated on multiple standard benchmarks, policies trained solely on 10% of the original dataset achieve performance comparable to those trained on the full dataset or to high-percentile behavior cloning baselines, achieving over 90% data compression. This work establishes a new paradigm for data-efficient, robust offline RL, significantly reducing reliance on large-scale expert demonstrations while preserving policy performance.
📝 Abstract
Offline reinforcement learning often requires a quality dataset that we can train a policy on. However, in many situations, it is not possible to get such a dataset, nor is it easy to train a policy to perform well in the actual environment given the offline data. We propose using data distillation to train and distill a better dataset which can then be used for training a better policy model. We show that our method is able to synthesize a dataset where a model trained on it achieves similar performance to a model trained on the full dataset or a model trained using percentile behavioral cloning. Our project site is available at $href{https://datasetdistillation4rl.github.io}{ ext{here}}$. We also provide our implementation at $href{https://github.com/ggflow123/DDRL}{ ext{this GitHub repository}}$.