IMS3: Breaking Distributional Aggregation in Diffusion-Based Dataset Distillation

📅 2026-03-14

📈 Citations: 0

✨ Influential: 0

career value

192K/year

🤖 AI Summary

This work addresses a key limitation of diffusion models in dataset distillation: their tendency to optimize generative likelihood, which concentrates synthetic samples in high-density regions of the data distribution while neglecting boundary samples critical for classification. To mitigate this, the authors propose two complementary strategies—Inversion Matching (IM) fine-tuning and Selective Subgroup Sampling (S³). IM aligns the feature distributions of real and synthetic data through inversion-guided refinement, while S³ dynamically selects informative subgroups during training to enhance diversity and inter-class separability without requiring additional training. Together, these techniques significantly improve the discriminative utility and generalization capability of distilled data without compromising generation quality, achieving state-of-the-art performance among diffusion-based distillation methods across multiple benchmarks.

Technology Category

Application Category

📝 Abstract

Dataset Distillation aims to synthesize compact datasets that can approximate the training efficacy of large-scale real datasets, offering an efficient solution to the increasing computational demands of modern deep learning. Recently, diffusion-based dataset distillation methods have shown great promise by leveraging the strong generative capacity of diffusion models to produce diverse and structurally consistent samples. However, a fundamental goal misalignment persists: diffusion models are optimized for generative likelihood rather than discriminative utility, resulting in over-concentration in high-density regions and inadequate coverage of boundary samples crucial for classification. To address this issue, we propose two complementary strategies. Inversion-Matching (IM) introduces an inversion-guided fine-tuning process that aligns denoising trajectories with their inversion counterparts, broadening distributional coverage and enhancing diversity. Selective Subgroup Sampling(S^3) is a training-free sampling mechanism that improves inter-class separability by selecting synthetic subsets that are both representative and distinctive. Extensive experiments demonstrate that our approach significantly enhances the discriminative quality and generalization of distilled datasets, achieving state-of-the-art performance among diffusion-based methods.

Problem

Research questions and friction points this paper is trying to address.

Dataset Distillation

Diffusion Models

Distributional Aggregation

Discriminative Utility

Boundary Samples

Innovation

Methods, ideas, or system contributions that make the work stand out.

Inversion-Matching

Selective Subgroup Sampling

diffusion-based dataset distillation