🤖 AI Summary
To address performance degradation caused by label shift—i.e., test-time shifts in class prior distributions—in supervised learning, this paper proposes FMAPLS, a Bayesian framework for label shift correction, and its online variant, online-FMAPLS. Methodologically, FMAPLS jointly optimizes Dirichlet hyperparameters and class priors, replacing gradient-based updates with a linear surrogate function (LSF) to yield closed-form solutions while preserving asymptotic equivalence. online-FMAPLS incorporates stochastic approximation in the E-step to enable real-time adaptation to streaming data and theoretically characterizes the fundamental trade-off between convergence rate and estimation accuracy. Experiments on CIFAR-100 and ImageNet demonstrate that FMAPLS and online-FMAPLS reduce KL divergence by up to 40% and 12%, respectively, over state-of-the-art methods, while significantly improving calibration accuracy, robustness to distribution shifts, and scalability to large-scale datasets.
📝 Abstract
Label shift, a prevalent challenge in supervised learning, arises when the class prior distribution of test data differs from that of training data, leading to significant degradation in classifier performance. To accurately estimate the test priors and enhance classification accuracy, we propose a Bayesian framework for label shift estimation, termed Full Maximum A Posterior Label Shift (FMAPLS), along with its online version, online-FMAPLS. Leveraging batch and online Expectation-Maximization (EM) algorithms, these methods jointly and dynamically optimize Dirichlet hyperparameters $oldsymbolα$ and class priors $oldsymbolπ$, thereby overcoming the rigid constraints of the existing Maximum A Posterior Label Shift (MAPLS) approach. Moreover, we introduce a linear surrogate function (LSF) to replace gradient-based hyperparameter updates, yielding closed-form solutions that reduce computational complexity while retaining asymptotic equivalence. The online variant substitutes the batch E-step with a stochastic approximation, enabling real-time adaptation to streaming data. Furthermore, our theoretical analysis reveals a fundamental trade-off between online convergence rate and estimation accuracy. Extensive experiments on CIFAR100 and ImageNet datasets under shuffled long-tail and Dirichlet test priors demonstrate that FMAPLS and online-FMAPLS respectively achieve up to 40% and 12% lower KL divergence and substantial improvements in post-shift accuracy over state-of-the-art baselines, particularly under severe class imbalance and distributional uncertainty. These results confirm the robustness, scalability, and suitability of the proposed methods for large-scale and dynamic learning scenarios.