Algorithmic Guarantees for Distilling Supervised and Offline RL Datasets

📅 2025-11-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses dataset distillation for supervised learning and offline reinforcement learning: how to generate a compact synthetic dataset from raw training data such that models trained on it achieve performance comparable to those trained on the full dataset. We propose a model-free loss-matching distillation framework that constructs random sampling regressors to precisely approximate the mean squared error (MSE) loss in supervised learning or the Bellman error in offline RL—thereby preserving essential optimization properties. Theoretically, we prove that only Õ(d²) samplers suffice to guarantee near-optimal loss for any bounded linear model in regression; for offline RL, we introduce the first unified modeling of state transitions and rewards within Bellman loss matching, achieving the theoretical lower bound. Empirically, our method significantly improves training efficiency and generalization on small distilled datasets across diverse supervised and offline RL benchmarks.

Technology Category

Application Category

📝 Abstract
Given a training dataset, the goal of dataset distillation is to derive a synthetic dataset such that models trained on the latter perform as well as those trained on the training dataset. In this work, we develop and analyze an efficient dataset distillation algorithm for supervised learning, specifically regression in $mathbb{R}^d$, based on matching the losses on the training and synthetic datasets with respect to a fixed set of randomly sampled regressors without any model training. Our first key contribution is a novel performance guarantee proving that our algorithm needs only $ ilde{O}(d^2)$ sampled regressors to derive a synthetic dataset on which the MSE loss of any bounded linear model is nearly the same as its MSE loss on the given training data. In particular, the model optimized on the synthetic data has close to minimum loss on the training data, thus performing nearly as well as the model optimized on the latter. Complementing this, we also prove a matching lower bound of $Ω(d^2)$ for the number of sampled regressors showing the tightness of our analysis. Our second contribution is to extend our algorithm to offline RL dataset distillation by matching the Bellman loss, unlike previous works which used a behavioral cloning objective. This is the first such method which leverages both, the rewards and the next state information, available in offline RL datasets, without any policy model optimization. Our algorithm generates a synthetic dataset whose Bellman loss with respect to any linear action-value predictor is close to the latter's Bellman loss on the offline RL training dataset. Therefore, a policy associated with an action-value predictor optimized on the synthetic dataset performs nearly as well as that derived from the one optimized on the training data. We conduct experiments to validate our theoretical guarantees and observe performance gains.
Problem

Research questions and friction points this paper is trying to address.

Develop efficient dataset distillation algorithm for supervised regression
Extend algorithm to offline RL by matching Bellman loss
Provide theoretical guarantees and lower bounds for algorithm performance
Innovation

Methods, ideas, or system contributions that make the work stand out.

Algorithm distills datasets via loss matching without model training
Uses O(d^2) random regressors for supervised learning guarantees
Extends to offline RL by matching Bellman loss with rewards and states
🔎 Similar Papers