MGD$^3$: Mode-Guided Dataset Distillation using Diffusion Models

📅 2025-05-25

📈 Citations: 0

✨ Influential: 0

career value

186K/year

🤖 AI Summary

Existing dataset distillation methods suffer from insufficient synthetic sample diversity and heavy reliance on model fine-tuning. To address these bottlenecks, we propose a fine-tuning-free, pattern-guided diffusion distillation framework featuring a novel three-stage mechanism: (i) clustering-driven pattern discovery, (ii) conditional sampling–guided pattern enhancement, and (iii) dynamic denoising step scheduling for termination guidance. Crucially, our method operates entirely atop pre-trained diffusion models—requiring no parameter modification, auxiliary training, or custom distillation loss—while simultaneously preserving intra-class diversity and suppressing artifacts. Evaluated on ImageNette, ImageIDC, ImageNet-100, and ImageNet-1K, it improves classification accuracy by 4.4%, 2.9%, 1.6%, and 1.6%, respectively, while substantially reducing computational overhead.

Technology Category

Application Category

📝 Abstract

Dataset distillation has emerged as an effective strategy, significantly reducing training costs and facilitating more efficient model deployment. Recent advances have leveraged generative models to distill datasets by capturing the underlying data distribution. Unfortunately, existing methods require model fine-tuning with distillation losses to encourage diversity and representativeness. However, these methods do not guarantee sample diversity, limiting their performance. We propose a mode-guided diffusion model leveraging a pre-trained diffusion model without the need to fine-tune with distillation losses. Our approach addresses dataset diversity in three stages: Mode Discovery to identify distinct data modes, Mode Guidance to enhance intra-class diversity, and Stop Guidance to mitigate artifacts in synthetic samples that affect performance. Our approach outperforms state-of-the-art methods, achieving accuracy gains of 4.4%, 2.9%, 1.6%, and 1.6% on ImageNette, ImageIDC, ImageNet-100, and ImageNet-1K, respectively. Our method eliminates the need for fine-tuning diffusion models with distillation losses, significantly reducing computational costs. Our code is available on the project webpage: https://jachansantiago.github.io/mode-guided-distillation/

Problem

Research questions and friction points this paper is trying to address.

Ensuring dataset diversity without fine-tuning diffusion models

Improving intra-class diversity in distilled datasets

Reducing computational costs in dataset distillation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Mode-Guided Diffusion Model without fine-tuning

Three-stage diversity enhancement strategy

Eliminates distillation loss computational costs

🔎 Similar Papers

Latent Dataset Distillation with Diffusion Models