🤖 AI Summary
Under empirical risk minimization (ERM), models often rely on spurious correlations—so-called “shortcuts”—leading to poor generalization on minority subgroups. To address this, we propose InterpoLated Learning (InterpoLL): a semantic-aware interpolation technique applied directly in the representation space, which blends features of majority and minority instances from the same class to explicitly incorporate minority patterns and attenuate shortcut reliance. InterpoLL operates at the feature level without modifying model architecture or introducing additional hyperparameters, enabling cross-group robust representation learning. Experiments across multiple natural language understanding benchmarks demonstrate that InterpoLL significantly improves accuracy on minority-group examples while preserving performance on majority-group examples. It consistently outperforms standard ERM and state-of-the-art debiasing methods—including those designed to mitigate shortcut learning—across diverse fairness-sensitive evaluation metrics.
📝 Abstract
Empirical risk minimization (ERM) incentivizes models to exploit shortcuts, i.e., spurious correlations between input attributes and labels that are prevalent in the majority of the training data but unrelated to the task at hand. This reliance hinders generalization on minority examples, where such correlations do not hold. Existing shortcut mitigation approaches are model-specific, difficult to tune, computationally expensive, and fail to improve learned representations. To address these issues, we propose InterpoLated Learning (InterpoLL) which interpolates the representations of majority examples to include features from intra-class minority examples with shortcut-mitigating patterns. This weakens shortcut influence, enabling models to acquire features predictive across both minority and majority examples. Experimental results on multiple natural language understanding tasks demonstrate that InterpoLL improves minority generalization over both ERM and state-of-the-art shortcut mitigation methods, without compromising accuracy on majority examples. Notably, these gains persist across encoder, encoder-decoder, and decoder-only architectures, demonstrating the method's broad applicability.