🤖 AI Summary
Class imbalance often induces underspecification for minority classes, exacerbating spurious correlations (e.g., Clever Hans effects) and impairing generalization. This paper establishes, for the first time, a systematic link between class imbalance and spurious correlations. We propose a counterfactual explanation–based XAI framework: attribution analysis identifies spurious feature dependencies amplified in minority classes, and counterfactual perturbations are applied to suppress their influence. Unlike conventional reweighting approaches, our method requires no modification to the loss function or data distribution. Evaluated on three benchmark datasets, our approach maintains overall accuracy while substantially reducing spurious correlations—yielding up to a 12.3% improvement in minority-class F1 score. These results demonstrate its effectiveness in enhancing model robustness and fairness without compromising predictive performance.
📝 Abstract
Class imbalance poses a fundamental challenge in machine learning, frequently leading to unreliable classification performance. While prior methods focus on data- or loss-reweighting schemes, we view imbalance as a data condition that amplifies Clever Hans (CH) effects by underspecification of minority classes. In a counterfactual explanations-based approach, we propose to leverage Explainable AI to jointly identify and eliminate CH effects emerging under imbalance. Our method achieves competitive classification performance on three datasets and demonstrates how CH effects emerge under imbalance, a perspective largely overlooked by existing approaches.