Fair Dataset Distillation via Cross-Group Barycenter Alignment

📅 2026-04-30

📈 Citations: 0

✨ Influential: 0

career value

167K/year

🤖 AI Summary

This work addresses the fairness gap introduced during dataset distillation, which often arises from overlooking differences in prediction patterns across subpopulations. The study reveals that this issue stems primarily from mismatched predictive signals among subgroups—not merely from imbalanced sample sizes—and proposes an unbiased distillation framework based on optimal transport barycenter alignment. By constructing a shared predictive information barycenter invariant to group distributions, the method guides the distillation process to learn representations consistent across subpopulations. Experiments demonstrate that the approach significantly reduces performance disparities among subgroups on multiple benchmark datasets while preserving overall model accuracy, thereby effectively mitigating bias induced by distillation.

📝 Abstract

Dataset Distillation aims to compress a large dataset into a small synthetic one while maintaining predictive performance. We show that as different demographic groups exhibit distinct predictive patterns, the distillation process struggles to simultaneously preserve informative signals for all subgroups, regardless of whether group sizes are mildly or severely imbalanced. Consequently, models trained on distilled data can experience substantial performance drops for certain subgroups, leading to fairness gaps. Crucially, these gaps do not disappear by merely correcting group imbalance, since they stem from fundamental mismatches in subgroup predictive patterns rather than from sample-size disparities alone. We therefore formally analyze the interaction between these two sources of bias and cast the solution as identifying a group-imbalance-agnostic barycenter of the predictive information that induces similar representations across all subgroups. By distilling toward this shared aggregate representation, we show that group fairness concerns can be reduced. Our approach is compatible with existing distillation methods, and empirical results show that it substantially reduces bias introduced by dataset distillation.

Problem

Research questions and friction points this paper is trying to address.

Dataset Distillation

Fairness

Subgroup Bias

Predictive Patterns

Barycenter Alignment

Innovation

Methods, ideas, or system contributions that make the work stand out.

Fair Dataset Distillation

Cross-Group Barycenter Alignment

Predictive Pattern Disparity