🤖 AI Summary
This paper addresses dataset condensation (DC), formulating it as a probabilistic approximation problem between the source and synthetic distributions via a unified discrepancy-based framework. Unlike conventional DC approaches that solely optimize generalization performance, this work is the first to incorporate discrepancy theory into DC, enabling joint optimization of multiple objectives—including generalization, robustness, and privacy preservation. We design a differentiable distribution distance metric and integrate it with gradient-based optimization of synthetic samples, yielding highly compact synthetic sets ($M ll N$). Experiments demonstrate that models trained from scratch on the condensed datasets achieve performance on par with or surpassing that of models trained on the full original datasets across all evaluated metrics—despite drastic reductions in data volume. These results validate the framework’s effectiveness, versatility, and practical utility for efficient and multifaceted dataset distillation.
📝 Abstract
Given a dataset of finitely many elements $mathcal{T} = {mathbf{x}_i}_{i = 1}^N$, the goal of dataset condensation (DC) is to construct a synthetic dataset $mathcal{S} = { ilde{mathbf{x}}_j}_{j = 1}^M$ which is significantly smaller ($M ll N$) such that a model trained from scratch on $mathcal{S}$ achieves comparable or even superior generalization performance to a model trained on $mathcal{T}$. Recent advances in DC reveal a close connection to the problem of approximating the data distribution represented by $mathcal{T}$ with a reduced set of points. In this work, we present a unified framework that encompasses existing DC methods and extend the task-specific notion of DC to a more general and formal definition using notions of discrepancy, which quantify the distance between probability distribution in different regimes. Our framework broadens the objective of DC beyond generalization, accommodating additional objectives such as robustness, privacy, and other desirable properties.