Explanation is All You Need in Distillation: Mitigating Bias and Shortcut Learning

📅 2024-07-13
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
AI models in critical domains such as medicine suffer from degraded out-of-distribution (OOD) generalization across hospitals and patients due to shortcut learning—e.g., spurious correlations between background/foreground features and labels. Method: This paper proposes an explanation distillation framework that requires neither unbiased data nor group annotations. It pioneers a paradigm wherein knowledge transfer is performed solely via explanation maps (e.g., Layer-wise Relevance Propagation), guiding a student model to inherit the causal decision logic of a vision-language model (VLM) teacher and thereby suppress bias reliance. Contribution/Results: The method achieves 98.2% OOD accuracy on COLOURED MNIST—substantially surpassing deep feature distillation (92.1%) and IRM (60.2%). On COCO-on-Places, it reduces the OOD accuracy gap to only 4.4%, outperforming all prior approaches. Its core contribution lies in decoupling explanatory signals from data biases, enabling highly robust and low-dependency OOD generalization.

Technology Category

Application Category

📝 Abstract
Bias and spurious correlations in data can cause shortcut learning, undermining out-of-distribution (OOD) generalization in deep neural networks. Most methods require unbiased data during training (and/or hyper-parameter tuning) to counteract shortcut learning. Here, we propose the use of explanation distillation to hinder shortcut learning. The technique does not assume any access to unbiased data, and it allows an arbitrarily sized student network to learn the reasons behind the decisions of an unbiased teacher, such as a vision-language model or a network processing debiased images. We found that it is possible to train a neural network with explanation (e.g by Layer Relevance Propagation, LRP) distillation only, and that the technique leads to high resistance to shortcut learning, surpassing group-invariant learning, explanation background minimization, and alternative distillation techniques. In the COLOURED MNIST dataset, LRP distillation achieved 98.2% OOD accuracy, while deep feature distillation and IRM achieved 92.1% and 60.2%, respectively. In COCO-on-Places, the undesirable generalization gap between in-distribution and OOD accuracy is only of 4.4% for LRP distillation, while the other two techniques present gaps of 15.1% and 52.1%, respectively.
Problem

Research questions and friction points this paper is trying to address.

Preventing shortcut learning in AI medical applications
Mitigating image background and foreground bias
Improving generalization to unseen hospitals and patients
Innovation

Methods, ideas, or system contributions that make the work stand out.

Training with explanation heatmaps alone
Matching teacher heatmaps without output loss
Resisting background and foreground bias without segmenters
🔎 Similar Papers
No similar papers found.