🤖 AI Summary
Deep neural networks often rely on spurious correlations in high-stakes scenarios, compromising their reliability. This work presents the first systematic integration of distributionally robust optimization, invariant risk minimization, and shortcut learning frameworks to evaluate the debiasing efficacy of explainable AI (XAI) methods—particularly counterfactual knowledge distillation (CFKD)—against non-XAI baselines under conditions of data scarcity and severe subgroup imbalance. Experimental results demonstrate that XAI approaches generally outperform non-XAI alternatives, with CFKD exhibiting the most stable generalization performance. The study further reveals that the practical challenges of acquiring group labels and the sparsity of minority-group samples significantly undermine the reliability of model deployment in real-world settings.
📝 Abstract
Deep Neural Networks (DNNs) are increasingly utilized in high-stakes domains like medical diagnostics and autonomous driving where model reliability is critical. However, the research landscape for ensuring this reliability is terminologically fractured across communities that pursue the same goal of ensuring models rely on causally relevant features rather than confounding signals. While frameworks such as distributionally robust optimization (DRO), invariant risk minimization (IRM), shortcut learning, simplicity bias, and the Clever Hans effect all address model failure due to spurious correlations, researchers typically only reference work within their own domains. This reproducibility study unifies these perspectives through a comparative analysis of correction methods under challenging constraints like limited data availability and severe subgroup imbalance. We evaluate recently proposed correction methods based on explainable artificial intelligence (XAI) techniques alongside popular non-XAI baselines using both synthetic and real-world datasets. Findings show that XAI-based methods generally outperform non-XAI approaches, with Counterfactual Knowledge Distillation (CFKD) proving most consistently effective at improving generalization. Our experiments also reveal that the practical application of many methods is hindered by a dependency on group labels, as manual annotation is often infeasible and automated tools like Spectral Relevance Analysis (SpRAy) struggle with complex features and severe imbalance. Furthermore, the scarcity of minority group samples in validation sets renders model selection and hyperparameter tuning unreliable, posing a significant obstacle to the deployment of robust and trustworthy models in safety-critical areas.