🤖 AI Summary
Existing out-of-distribution generalization methods suffer from susceptibility to spurious correlations and typically rely on multi-domain labels or hand-crafted group annotations and data augmentation—constraints that limit practical applicability. Method: This paper proposes a robust training framework that requires neither domain/group labels nor specialized data augmentation. Its core innovation is the construction of semantic invariance pairs, from which adaptive corrective gradients are derived and unsupervisedly injected into standard backpropagation to enforce invariance constraints. Contribution/Results: By eliminating dependence on multi-source domain data and explicit grouping supervision, the method enables end-to-end robust learning. Evaluated on ColoredMNIST, Waterbird-100, and CelebA under group shift settings, it achieves an average accuracy improvement of 7.2%, demonstrating significantly enhanced generalization under distributional shifts.
📝 Abstract
Out-of-distribution generalization of machine learning models remains challenging since the models are inherently bound to the training data distribution. This especially manifests, when the learned models rely on spurious correlations. Most of the existing approaches apply data manipulation, representation learning, or learning strategies to achieve generalizable models. Unfortunately, these approaches usually require multiple training domains, group labels, specialized augmentation, or pre-processing to reach generalizable models. We propose a novel approach that addresses these limitations by providing a technique to guide the neural network through the training phase. We first establish input pairs, representing the spurious attribute and describing the invariance, a characteristic that should not affect the outcome of the model. Based on these pairs, we form a corrective gradient complementing the traditional gradient descent approach. We further make this correction mechanism adaptive based on a predefined invariance condition. Experiments on ColoredMNIST, Waterbird-100, and CelebA datasets demonstrate the effectiveness of our approach and the robustness to group shifts.