🤖 AI Summary
This paper investigates the “unreasonable effectiveness” of Last-Layer Retraining (LLR) in significantly improving worst-group accuracy under class imbalance. Contrary to prevailing hypotheses, the authors demonstrate that LLR’s efficacy does not stem from mitigating neural collapse but rather from an implicit group- or class-balanced distribution induced by the validation (holdout) set—even when the overall holdout set is imbalanced. Through gradient descent’s implicit bias, LLR enables the final linear layer to learn a more robust classifier for minority groups. This insight unifies the theoretical understanding of methods such as Class-Balanced LLR (CB-LLR) and Adaptive Fair Representation (AFR), and highlights the critical role of validation-set group balance for robust generalization. Extensive experiments across multiple imbalanced benchmarks confirm that merely reinitializing and fine-tuning the last layer suffices to substantially boost worst-group performance; crucially, this gain is attributable primarily to the implicit group balance in the validation set—not to data-level properties or architectural features.
📝 Abstract
Last-layer retraining (LLR) methods -- wherein the last layer of a neural network is reinitialized and retrained on a held-out set following ERM training -- have garnered interest as an efficient approach to rectify dependence on spurious correlations and improve performance on minority groups. Surprisingly, LLR has been found to improve worst-group accuracy even when the held-out set is an imbalanced subset of the training set. We initially hypothesize that this ``unreasonable effectiveness'' of LLR is explained by its ability to mitigate neural collapse through the held-out set, resulting in the implicit bias of gradient descent benefiting robustness. Our empirical investigation does not support this hypothesis. Instead, we present strong evidence for an alternative hypothesis: that the success of LLR is primarily due to better group balance in the held-out set. We conclude by showing how the recent algorithms CB-LLR and AFR perform implicit group-balancing to elicit a robustness improvement.