Dataset Distillation for Pre-Trained Self-Supervised Vision Models

📅 2025-11-20

📈 Citations: 0

✨ Influential: 0

career value

196K/year

🤖 AI Summary

This work addresses the challenge of distilling compact synthetic image datasets from large-scale real data to replicate the performance of pretrained self-supervised vision models (e.g., DINO, CLIP) on linear probe evaluation. We propose linear gradient matching: leveraging a frozen pretrained feature extractor, we align gradients of synthetic and real images with respect to a linear classifier, thereby guiding synthetic data generation. Crucially, our method enables cross-model distillation—for instance, using DINO-extracted features to synthesize data that effectively trains CLIP linear probes—substantially improving generalization. Experiments demonstrate that the distilled synthetic datasets outperform all real-image baselines on fine-grained classification and model interpretability tasks. Moreover, they faithfully preserve embedding-space similarity structure and accurately expose model susceptibility to spurious correlations, offering both empirical gains and diagnostic utility.

Technology Category

Application Category

📝 Abstract

The task of dataset distillation aims to find a small set of synthetic images such that training a model on them reproduces the performance of the same model trained on a much larger dataset of real samples. Existing distillation methods focus on synthesizing datasets that enable training randomly initialized models. In contrast, state-of-the-art vision approaches are increasingly building on large, pre-trained self-supervised models rather than training from scratch. In this paper, we investigate the problem of distilling datasets that enable us to optimally train linear probes on top of such large, pre-trained vision models. We introduce a method of dataset distillation for this task called Linear Gradient Matching that optimizes the synthetic images such that, when passed through a pre-trained feature extractor, they induce gradients in the linear classifier similar to those produced by the real data. Our method yields synthetic data that outperform all real-image baselines and, remarkably, generalize across pre-trained vision models, enabling us, for instance, to train a linear CLIP probe that performs competitively using a dataset distilled via a DINO backbone. Further, we show that our distilled datasets are exceptionally effective for fine-grained classification and provide a valuable tool for model interpretability, predicting, among other things, how similar two models' embedding spaces are under the platonic representation hypothesis or whether a model is sensitive to spurious correlations in adversarial datasets.

Problem

Research questions and friction points this paper is trying to address.

Distilling synthetic datasets for training linear probes on pre-trained self-supervised vision models

Optimizing synthetic images to match gradients from real data in linear classifiers

Enhancing fine-grained classification and model interpretability through distilled datasets

Innovation

Methods, ideas, or system contributions that make the work stand out.

Optimizes synthetic images via gradient matching

Distills datasets for pre-trained self-supervised models

Enables cross-model generalization of distilled datasets

🔎 Similar Papers

A Review on Discriminative Self-supervised Learning Methods in Computer Vision

2024-05-08Citations: 0

Bosch Group

Renningen, BW, DE

Research Scientist Intern, Multimodal Generative AI and Robotics (PhD)