ManifoldGD: Training-Free Hierarchical Manifold Guidance for Diffusion-Based Dataset Distillation

📅 2026-02-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limitations of existing training-free data distillation methods, which often employ simplistic guidance strategies that fail to balance semantic consistency and sample diversity. To overcome this, we propose a novel training-free distillation framework for diffusion models that, for the first time, incorporates the geometry of local latent manifolds into the distillation process. Our approach leverages a pretrained VAE to extract features, constructs a multi-scale IPC (Images Per Class) core set via hierarchical clustering, and enforces geometric-aware constraints by projecting alignment vectors onto the tangent spaces of local manifolds at each denoising step. Extensive experiments demonstrate that our method consistently outperforms both training-free and training-based baselines across multiple metrics—including FID, ℓ² distance between embeddings of synthetic and real data, and downstream classification accuracy—significantly enhancing both the efficiency and quality of distilled data.

Technology Category

Application Category

📝 Abstract
In recent times, large datasets hinder efficient model training while also containing redundant concepts. Dataset distillation aims to synthesize compact datasets that preserve the knowledge of large-scale training sets while drastically reducing storage and computation. Recent advances in diffusion models have enabled training-free distillation by leveraging pre-trained generative priors; however, existing guidance strategies remain limited. Current score-based methods either perform unguided denoising or rely on simple mode-based guidance toward instance prototype centroids (IPC centroids), which often are rudimentary and suboptimal. We propose Manifold-Guided Distillation (ManifoldGD), a training-free diffusion-based framework that integrates manifold consistent guidance at every denoising timestep. Our method employs IPCs computed via a hierarchical, divisive clustering of VAE latent features, yielding a multi-scale coreset of IPCs that captures both coarse semantic modes and fine intra-class variability. Using a local neighborhood of the extracted IPC centroids, we create the latent manifold for each diffusion denoising timestep. At each denoising step, we project the mode-alignment vector onto the local tangent space of the estimated latent manifold, thus constraining the generation trajectory to remain manifold-faithful while preserving semantic consistency. This formulation improves representativeness, diversity, and image fidelity without requiring any model retraining. Empirical results demonstrate consistent gains over existing training-free and training-based baselines in terms of FID, l2 distance among real and synthetic dataset embeddings, and classification accuracy, establishing ManifoldGD as the first geometry-aware training-free data distillation framework.
Problem

Research questions and friction points this paper is trying to address.

dataset distillation
diffusion models
manifold guidance
training-free
data efficiency
Innovation

Methods, ideas, or system contributions that make the work stand out.

Manifold Guidance
Training-Free Distillation
Diffusion Models
Hierarchical Clustering
Latent Manifold
🔎 Similar Papers
No similar papers found.