IMS3: Breaking Distributional Aggregation in Diffusion-Based Dataset Distillation

📅 2026-03-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses a key limitation of diffusion models in dataset distillation: their tendency to optimize generative likelihood, which concentrates synthetic samples in high-density regions of the data distribution while neglecting boundary samples critical for classification. To mitigate this, the authors propose two complementary strategies—Inversion Matching (IM) fine-tuning and Selective Subgroup Sampling (S³). IM aligns the feature distributions of real and synthetic data through inversion-guided refinement, while S³ dynamically selects informative subgroups during training to enhance diversity and inter-class separability without requiring additional training. Together, these techniques significantly improve the discriminative utility and generalization capability of distilled data without compromising generation quality, achieving state-of-the-art performance among diffusion-based distillation methods across multiple benchmarks.

Technology Category

Application Category

📝 Abstract
Dataset Distillation aims to synthesize compact datasets that can approximate the training efficacy of large-scale real datasets, offering an efficient solution to the increasing computational demands of modern deep learning. Recently, diffusion-based dataset distillation methods have shown great promise by leveraging the strong generative capacity of diffusion models to produce diverse and structurally consistent samples. However, a fundamental goal misalignment persists: diffusion models are optimized for generative likelihood rather than discriminative utility, resulting in over-concentration in high-density regions and inadequate coverage of boundary samples crucial for classification. To address this issue, we propose two complementary strategies. Inversion-Matching (IM) introduces an inversion-guided fine-tuning process that aligns denoising trajectories with their inversion counterparts, broadening distributional coverage and enhancing diversity. Selective Subgroup Sampling(S^3) is a training-free sampling mechanism that improves inter-class separability by selecting synthetic subsets that are both representative and distinctive. Extensive experiments demonstrate that our approach significantly enhances the discriminative quality and generalization of distilled datasets, achieving state-of-the-art performance among diffusion-based methods.
Problem

Research questions and friction points this paper is trying to address.

Dataset Distillation
Diffusion Models
Distributional Aggregation
Discriminative Utility
Boundary Samples
Innovation

Methods, ideas, or system contributions that make the work stand out.

Inversion-Matching
Selective Subgroup Sampling
diffusion-based dataset distillation
distributional coverage
discriminative utility