Challenges and Opportunities in Improving Worst-Group Generalization in Presence of Spurious Features

📅 2023-06-21

📈 Citations: 2

✨ Influential: 0

career value

176K/year

🤖 AI Summary

This work addresses the significant degradation in worst-group generalization performance of deep neural networks caused by reliance on “slow-learning spurious features”—late-emerging, class-irrelevant correlations acquired during training. We systematically characterize the failure mechanisms of existing group inference methods under such spurious-feature delay learning. To enable rigorous evaluation, we introduce the first benchmark datasets—Animals and SUN—designed for multi-class, multi-group settings with delayed spurious feature learning. We empirically demonstrate that multi-class/multi-group configurations severely impair worst-group accuracy and propose an efficient model selection strategy highlighting the critical role of fine-grained hyperparameter optimization. Extensive experiments across eight state-of-the-art debiasing methods (e.g., ERM, GroupDRO, IRG, LFF) and five vision datasets confirm substantial performance degradation of all current approaches in this challenging regime. Code, datasets, and trained models are fully open-sourced.

📝 Abstract

Deep neural networks often exploit *spurious* features that are present in the majority of examples within a class during training. This leads to *poor worst-group test accuracy*, i.e., poor accuracy for minority groups that lack these spurious features. Despite the growing body of recent efforts to address spurious correlations (SC), several challenging settings remain unexplored.In this work, we propose studying methods to mitigate SC in settings with: 1) spurious features that are learned more slowly, 2) a larger number of classes, and 3) a larger number of groups. We introduce two new datasets, Animals and SUN, to facilitate this study and conduct a systematic benchmarking of 8 state-of-the-art (SOTA) methods across a total of 5 vision datasets, training over 5,000 models. Through this, we highlight how existing group inference methods struggle in the presence of spurious features that are learned later in training. Additionally, we demonstrate how all existing methods struggle in settings with more groups and/or classes. Finally, we show the importance of careful model selection (hyperparameter tuning) in extracting optimal performance, especially in the more challenging settings we introduced, and propose more cost-efficient strategies for model selection. Overall, through extensive and systematic experiments, this work uncovers a suite of new challenges and opportunities for improving worst-group generalization in the presence of spurious features. Our datasets, methods and scripts available at https://github.com/BigML-CS-UCLA/SpuCo.

Problem

Research questions and friction points this paper is trying to address.

Address poor worst-group accuracy due to spurious features

Mitigate spurious correlations in complex multi-class settings

Improve model selection for better worst-group generalization

Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces two new datasets for spurious correlation study

Benchmarks 8 SOTA methods across 5 vision datasets

Proposes cost-efficient model selection strategies

🔎 Similar Papers

Spurious Correlations in Machine Learning: A Survey