🤖 AI Summary
This study elucidates the interplay among shortcut feature attraction during training, model reliance on shortcut decision rules, and out-of-distribution (OOD) failure across distribution families, revealing their mechanistic role in OOD generalization breakdown. By constructing a minimal binary classification setting that incorporates both invariant features and family-dependent shortcut features, and employing ridge-regularized logistic regression, the authors conduct theoretical analyses and synthetic experiments under both deterministic and noisy regimes. They theoretically disentangle shortcut learning, shifts in decision rules, and cross-family OOD failure for the first time, demonstrating that ridge regularization preserves an invariant-dominated classifier, thereby averting deterministic OOD failure. However, when shortcut signals dominate over noisy invariant signals, the model adopts shortcut-based rules, and its OOD performance hinges on the correlation or sign consistency of shortcut features in the test distribution.
📝 Abstract
Shortcut features are often invoked to explain out-of-distribution (OOD) failure, but training correlation, learned shortcut use, and test-time failure need not coincide. We study a minimal binary model with one invariant coordinate and one family-dependent shortcut coordinate. In the deterministic regime, positive average shortcut correlation pulls logistic ERM toward positive shortcut weight, but ridge regularization keeps the classifier invariant-dominated and prevents deterministic OOD failure. When the invariant coordinate is noisy, ridge-logistic ERM switches to the shortcut rule once the training shortcut signal exceeds the invariant signal. Whether that transition causes failure depends on the held-out family: weaker shortcut correlation yields positive excess risk, and sign-flipped families yield above-chance error. Synthetic checks match these analytic regimes and show that the same training-side transition can have different held-out consequences. The model separates shortcut attraction, shortcut-rule transition, and cross-family OOD failure.