🤖 AI Summary
To address prevalent cross-node and intra-node label shifts in multi-node distributed learning, this paper proposes Variational Ratio Learning for Shifts (VRLS), which enhances model generalization to shifted test distributions without relocating local data. VRLS introduces Shannon entropy regularization into label density ratio estimation for the first time, enabling cross-node collaborative learning and dynamic adaptive calibration, with a theoretically guaranteed probabilistic error bound. Built upon maximum-likelihood density ratio estimation and a distributed density ratio update mechanism, VRLS ensures both theoretical rigor and engineering practicality. Extensive experiments on MNIST, Fashion-MNIST, and CIFAR-10 under imbalanced label shift settings demonstrate that VRLS outperforms state-of-the-art baselines by up to 20% in classification accuracy, significantly mitigating performance degradation induced by label shift.
📝 Abstract
We address the challenge of minimizing true risk in multi-node distributed learning. These systems are frequently exposed to both inter-node and intra-node label shifts, which present a critical obstacle to effectively optimizing model performance while ensuring that data remains confined to each node. To tackle this, we propose the Versatile Robust Label Shift (VRLS) method, which enhances the maximum likelihood estimation of the test-to-train label density ratio. VRLS incorporates Shannon entropy-based regularization and adjusts the density ratio during training to better handle label shifts at the test time. In multi-node learning environments, VRLS further extends its capabilities by learning and adapting density ratios across nodes, effectively mitigating label shifts and improving overall model performance. Experiments conducted on MNIST, Fashion MNIST, and CIFAR-10 demonstrate the effectiveness of VRLS, outperforming baselines by up to 20% in imbalanced settings. These results highlight the significant improvements VRLS offers in addressing label shifts. Our theoretical analysis further supports this by establishing high-probability bounds on estimation errors.