Dataset Distillation for Offline Reinforcement Learning

📅 2024-07-29

🏛️ arXiv.org

📈 Citations: 4

✨ Influential: 0

career value

213K/year

🤖 AI Summary

Offline reinforcement learning (RL) faces two key challenges: scarcity of high-quality offline datasets and poor generalization of learned policies. To address these, this work introduces data distillation—the first such application in offline RL—proposing a policy-oriented, environment-free offline data distillation framework. Our method integrates gradient matching, behavior cloning optimization, and meta-learning principles within a differentiable data synthesis pipeline to extract a compact, high-information-density distilled dataset from raw offline data. Evaluated on multiple standard benchmarks, policies trained solely on 10% of the original dataset achieve performance comparable to those trained on the full dataset or to high-percentile behavior cloning baselines, achieving over 90% data compression. This work establishes a new paradigm for data-efficient, robust offline RL, significantly reducing reliance on large-scale expert demonstrations while preserving policy performance.

Technology Category

Application Category

📝 Abstract

Offline reinforcement learning often requires a quality dataset that we can train a policy on. However, in many situations, it is not possible to get such a dataset, nor is it easy to train a policy to perform well in the actual environment given the offline data. We propose using data distillation to train and distill a better dataset which can then be used for training a better policy model. We show that our method is able to synthesize a dataset where a model trained on it achieves similar performance to a model trained on the full dataset or a model trained using percentile behavioral cloning. Our project site is available at $href{https://datasetdistillation4rl.github.io}{ ext{here}}$. We also provide our implementation at $href{https://github.com/ggflow123/DDRL}{ ext{this GitHub repository}}$.

Problem

Research questions and friction points this paper is trying to address.

Distilling datasets for offline reinforcement learning training

Synthesizing condensed datasets matching full dataset performance

Improving policy training with distilled offline RL data

Innovation

Methods, ideas, or system contributions that make the work stand out.

Dataset distillation synthesizes condensed training datasets

Method enables policy training with distilled data efficiency

Achieves performance comparable to full dataset training

🔎 Similar Papers

State-Constrained Offline Reinforcement Learning