Improving Knowledge Distillation Under Unknown Covariate Shift Through Confidence-Guided Data Augmentation

📅 2025-06-02

📈 Citations: 0

✨ Influential: 0

career value

153K/year

🤖 AI Summary

In knowledge distillation, unknown covariate shift causes student models to over-rely on spurious features present in the training set, degrading robustness under test distribution shifts. To address this, we propose confidence-guided adversarial diffusion augmentation: leveraging teacher model confidence scores to identify hard samples exhibiting maximal prediction disagreement between teacher and student, and subsequently guiding a diffusion model to generate targeted, robustness-enhancing data—without requiring prior knowledge of spurious features. Our method unifies confidence-aware modeling, diffusion-based generation, and group-robust optimization. Evaluated on CelebA, SpuCo Birds, and Spurious ImageNet, it significantly improves worst-group accuracy (+3.2–5.8%), average-group accuracy, and spurious-feature mAUC, consistently outperforming existing diffusion-based augmentation baselines. This work establishes a novel paradigm for robust knowledge distillation under unknown distribution shifts.

Technology Category

Application Category

📝 Abstract

Large foundation models trained on extensive datasets demonstrate strong zero-shot capabilities in various domains. To replicate their success when data and model size are constrained, knowledge distillation has become an established tool for transferring knowledge from foundation models to small student networks. However, the effectiveness of distillation is critically limited by the available training data. This work addresses the common practical issue of covariate shift in knowledge distillation, where spurious features appear during training but not at test time. We ask the question: when these spurious features are unknown, yet a robust teacher is available, is it possible for a student to also become robust to them? We address this problem by introducing a novel diffusion-based data augmentation strategy that generates images by maximizing the disagreement between the teacher and the student, effectively creating challenging samples that the student struggles with. Experiments demonstrate that our approach significantly improves worst group and mean group accuracy on CelebA and SpuCo Birds as well as the spurious mAUC on spurious ImageNet under covariate shift, outperforming state-of-the-art diffusion-based data augmentation baselines

Problem

Research questions and friction points this paper is trying to address.

Addressing covariate shift in knowledge distillation with unknown spurious features

Improving student model robustness using teacher-student disagreement-guided data augmentation

Enhancing worst-group and mean accuracy under covariate shift conditions

Innovation

Methods, ideas, or system contributions that make the work stand out.

Confidence-guided diffusion data augmentation

Maximizes teacher-student disagreement samples

Improves accuracy under covariate shift

🔎 Similar Papers

Revisiting Knowledge Distillation under Distribution Shift