🤖 AI Summary
This work addresses the problem of optimally fusing model predictions with noisy ground-truth labels in binary classification to minimize prediction error. We propose an iterative retraining framework grounded in Approximate Message Passing (AMP), which rigorously derives the Bayes-optimal aggregation function for arbitrary label noise levels—proving it globally minimizes prediction error—and quantifies the fundamental performance limits of multi-round retraining. Our method unifies Gaussian Mixture Models (GMM), Generalized Linear Models (GLM), and linear probes, enabling analytically tractable self-amplification via cross-entropy optimization. Experiments demonstrate that the proposed aggregator significantly outperforms existing baselines under high label noise. Crucially, theoretical predictions align closely with empirical results, establishing the first analytically solvable, Bayes-optimal, and generalization-guaranteed paradigm for iterative self-amplification under label corruption.
📝 Abstract
Retraining a model using its own predictions together with the original, potentially noisy labels is a well-known strategy for improving the model performance. While prior works have demonstrated the benefits of specific heuristic retraining schemes, the question of how to optimally combine the model's predictions and the provided labels remains largely open. This paper addresses this fundamental question for binary classification tasks. We develop a principled framework based on approximate message passing (AMP) to analyze iterative retraining procedures for two ground truth settings: Gaussian mixture model (GMM) and generalized linear model (GLM). Our main contribution is the derivation of the Bayes optimal aggregator function to combine the current model's predictions and the given labels, which when used to retrain the same model, minimizes its prediction error. We also quantify the performance of this optimal retraining strategy over multiple rounds. We complement our theoretical results by proposing a practically usable version of the theoretically-optimal aggregator function for linear probing with the cross-entropy loss, and demonstrate its superiority over baseline methods in the high label noise regime.