🤖 AI Summary
Existing dataset distillation (DD) methods rely on a single teacher model, introducing architectural prior bias that leads to overly smooth and homogeneous synthetic samples—thereby undermining intra-class diversity and generalization. To address this, we propose PRISM, the first DD framework to decouple logit matching from regularization: the former is guided by a primary teacher, while the latter is supervised by heterogeneous auxiliary teachers. PRISM further introduces cross-class batch synthesis for efficient parallel generation and employs a dual supervision strategy—backbone logit matching combined with BatchNorm statistics alignment over random subsets—to enable decoupled optimization. On ImageNet-1K, PRISM consistently outperforms state-of-the-art methods—including SRe2L and G-VBSM—under low-to-moderate images-per-class (IPC) settings. Notably, it achieves significantly lower feature cosine similarity across classes, empirically validating its effectiveness in enhancing data diversity and downstream generalization.
📝 Abstract
Dataset distillation (DD) promises compact yet faithful synthetic data, but existing approaches often inherit the inductive bias of a single teacher model. As dataset size increases, this bias drives generation toward overly smooth, homogeneous samples, reducing intra-class diversity and limiting generalization. We present PRISM (PRIors from diverse Source Models), a framework that disentangles architectural priors during synthesis. PRISM decouples the logit-matching and regularization objectives, supervising them with different teacher architectures: a primary model for logits and a stochastic subset for batch-normalization (BN) alignment. On ImageNet-1K, PRISM consistently and reproducibly outperforms single-teacher methods (e.g., SRe2L) and recent multi-teacher variants (e.g., G-VBSM) at low- and mid-IPC regimes. The generated data also show significantly richer intra-class diversity, as reflected by a notable drop in cosine similarity between features. We further analyze teacher selection strategies (pre- vs. intra-distillation) and introduce a scalable cross-class batch formation scheme for fast parallel synthesis. Code will be released after the review period.