Dataset Distillation via Relative Distribution Matching and Cognitive Heritage

📅 2026-02-05

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work addresses the high computational and memory costs of conventional dataset distillation when applied to self-supervised pre-trained models by proposing an efficient distillation framework based on statistical flow matching. The method optimizes synthetic images through statistical flows between class centers in the original data and incorporates single-step data augmentation, a linear projector, and a strategy for reusing the pre-trained classifier to substantially reduce resource consumption. Experimental results demonstrate that the proposed approach achieves performance on par with or superior to state-of-the-art methods while reducing GPU memory usage by a factor of ten and accelerating runtime by fourfold.

Technology Category

Application Category

📝 Abstract

Dataset distillation seeks to synthesize a highly compact dataset that achieves performance comparable to the original dataset on downstream tasks. For the classification task that use pre-trained self-supervised models as backbones, previous linear gradient matching optimizes synthetic images by encouraging them to mimic the gradient updates induced by real images on the linear classifier. However, this batch-level formulation requires loading thousands of real images and applying multiple rounds of differentiable augmentations to synthetic images at each distillation step, leading to substantial computational and memory overhead. In this paper, we introduce statistical flow matching , a stable and efficient supervised learning framework that optimizes synthetic images by aligning constant statistical flows from target class centers to non-target class centers in the original data. Our approach loads raw statistics only once and performs a single augmentation pass on the synthetic data, achieving performance comparable to or better than the state-of-the-art methods with 10x lower GPU memory usage and 4x shorter runtime. Furthermore, we propose a classifier inheritance strategy that reuses the classifier trained on the original dataset for inference, requiring only an extremely lightweight linear projector and marginal storage while achieving substantial performance gains.

Problem

Research questions and friction points this paper is trying to address.

Dataset Distillation

Computational Overhead

Memory Efficiency

Self-supervised Models

Classification Task

Innovation

Methods, ideas, or system contributions that make the work stand out.

statistical flow matching

dataset distillation

classifier inheritance

relative distribution matching

efficient synthetic data

🔎 Similar Papers

Revisiting Knowledge Distillation under Distribution Shift

2023-12-25arXiv.orgCitations: 1

Authors to Follow