Going Beyond Feature Similarity: Effective Dataset distillation based on Class-aware Conditional Mutual Information

📅 2024-12-13
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing dataset distillation methods rely on feature distribution matching, resulting in synthetically generated data with excessive complexity and poor learnability. To address this, we propose a class-aware complexity metric grounded in conditional mutual information (CMI), and—novelly—formulate CMI minimization as a regularizer jointly optimized with the distillation loss. This yields compact, highly learnable, and generalizable synthetic datasets. Our method estimates CMI within a pre-trained feature space to enable empirical minimization and embeds it into a plug-and-play multi-objective optimization framework compatible with mainstream distillation algorithms. Extensive experiments across multiple benchmarks demonstrate consistent improvements: +2.1–4.7 percentage points in downstream task accuracy and accelerated training convergence. By grounding distillation in an information-theoretic, interpretable, and differentiable complexity measure, our approach establishes a new paradigm for principled, optimization-aware dataset distillation.

Technology Category

Application Category

📝 Abstract
Dataset distillation (DD) aims to minimize the time and memory consumption needed for training deep neural networks on large datasets, by creating a smaller synthetic dataset that has similar performance to that of the full real dataset. However, current dataset distillation methods often result in synthetic datasets that are excessively difficult for networks to learn from, due to the compression of a substantial amount of information from the original data through metrics measuring feature similarity, e,g., distribution matching (DM). In this work, we introduce conditional mutual information (CMI) to assess the class-aware complexity of a dataset and propose a novel method by minimizing CMI. Specifically, we minimize the distillation loss while constraining the class-aware complexity of the synthetic dataset by minimizing its empirical CMI from the feature space of pre-trained networks, simultaneously. Conducting on a thorough set of experiments, we show that our method can serve as a general regularization method to existing DD methods and improve the performance and training efficiency.
Problem

Research questions and friction points this paper is trying to address.

Reduce training time with synthetic datasets
Minimize class-aware dataset complexity
Enhance dataset distillation performance
Innovation

Methods, ideas, or system contributions that make the work stand out.

Minimizes conditional mutual information
Enhances dataset distillation efficiency
Improves synthetic dataset learnability
🔎 Similar Papers
No similar papers found.