🤖 AI Summary
Existing dataset distillation methods assume clean labels, rendering them vulnerable to label noise prevalent in real-world scenarios and prone to bias from mislabeled samples. To address this, we propose Trust-Aware Dual-loop Optimization (TDO), the first framework to decouple distillation into trusted and untrusted sample subspaces. The outer loop dynamically models sample trustworthiness to select high-confidence samples, ensuring robust distillation. The inner loop reconstructs untrusted samples via gradient recalibration and sample reweighting to recover latent supervisory signals. A noise-aware discrimination mechanism—driven jointly by loss and prediction consistency—enables synergistic expansion of the trusted subspace across both loops. Evaluated on CIFAR-10/100 and Tiny ImageNet under symmetric, asymmetric, and real-world label noise, TDO consistently enhances multiple distillation methods, achieving average accuracy gains of 3.2–5.8%.
📝 Abstract
Dataset distillation compresses a large dataset into a small synthetic subset that retains essential information. Existing methods assume that all samples are perfectly labeled, limiting their real-world applications where incorrect labels are ubiquitous. These mislabeled samples introduce untrustworthy information into the dataset, which misleads model optimization in dataset distillation. To tackle this issue, we propose a Trust-Aware Diversion (TAD) dataset distillation method. Our proposed TAD introduces an iterative dual-loop optimization framework for data-effective distillation. Specifically, the outer loop divides data into trusted and untrusted spaces, redirecting distillation toward trusted samples to guarantee trust in the distillation process. This step minimizes the impact of mislabeled samples on dataset distillation. The inner loop maximizes the distillation objective by recalibrating untrusted samples, thus transforming them into valuable ones for distillation. This dual-loop iteratively refines and compensates for each other, gradually expanding the trusted space and shrinking the untrusted space. Experiments demonstrate that our method can significantly improve the performance of existing dataset distillation methods on three widely used benchmarks (CIFAR10, CIFAR100, and Tiny ImageNet) in three challenging mislabeled settings (symmetric, asymmetric, and real-world).