Mitigating Clever Hans Strategies in Image Classifiers through Generating Counterexamples

📅 2025-10-20

📈 Citations: 0

✨ Influential: 0

career value

139K/year

🤖 AI Summary

Deep learning models are vulnerable to spurious correlations, leading to Clever Hans–style predictions that impair out-of-distribution robustness. Existing group-based distributionally robust methods (e.g., DFR) require explicit group annotations—yet such labels are often unavailable, subgroup samples are sparse, and fine-grained group partitioning becomes infeasible under multiple confounding factors. To address these limitations, we propose Counterfactual Knowledge Distillation (CKD), a label-free framework that generates diverse counterfactual samples via counterfactual explanations. CKD jointly leverages knowledge distillation and human feedback to refine decision boundaries, enabling implicit reweighting of sparse subgroups and effective data augmentation. The method naturally extends to multiple confounders and significantly improves balanced generalization under low-data regimes and strong spurious correlations. Evaluated on five synthetic and real-world industrial datasets, CKD consistently outperforms state-of-the-art debiasing approaches.

Technology Category

Application Category

📝 Abstract

Deep learning models remain vulnerable to spurious correlations, leading to so-called Clever Hans predictors that undermine robustness even in large-scale foundation and self-supervised models. Group distributional robustness methods, such as Deep Feature Reweighting (DFR) rely on explicit group labels to upweight underrepresented subgroups, but face key limitations: (1) group labels are often unavailable, (2) low within-group sample sizes hinder coverage of the subgroup distribution, and (3) performance degrades sharply when multiple spurious correlations fragment the data into even smaller groups. We propose Counterfactual Knowledge Distillation (CFKD), a framework that sidesteps these issues by generating diverse counterfactuals, enabling a human annotator to efficiently explore and correct the model's decision boundaries through a knowledge distillation step. Unlike DFR, our method not only reweights the undersampled groups, but it also enriches them with new data points. Our method does not require any confounder labels, achieves effective scaling to multiple confounders, and yields balanced generalization across groups. We demonstrate CFKD's efficacy across five datasets, spanning synthetic tasks to an industrial application, with particularly strong gains in low-data regimes with pronounced spurious correlations. Additionally, we provide an ablation study on the effect of the chosen counterfactual explainer and teacher model, highlighting their impact on robustness.

Problem

Research questions and friction points this paper is trying to address.

Addressing Clever Hans predictors in deep learning models

Overcoming limitations of group distributional robustness methods

Generating counterfactuals to correct model decision boundaries

Innovation

Methods, ideas, or system contributions that make the work stand out.

Generates diverse counterfactuals to correct decision boundaries

Uses knowledge distillation without requiring confounder labels

Enriches undersampled groups with new data points

🔎 Similar Papers

A Survey and Evaluation of Adversarial Attacks for Object Detection