🤖 AI Summary
This study addresses the fundamental tension between strong aggregate performance and frequent individual-level misclassifications in machine learning models under label noise. We introduce “individual-level regret”—a novel metric quantifying unforeseen misclassifications attributable to label corruption. To mitigate this issue, we propose a robust modeling framework grounded in denoised dataset resampling, integrating ensemble learning, counterfactual data generation, robust empirical risk minimization, and uncertainty calibration—thereby enabling estimable individual error probabilities. Evaluated across multiple clinical prediction tasks, our approach significantly reduces volatility in individual-level errors, enhancing model reliability and clinical deployability. Empirical results demonstrate a 37–62% reduction in regretful misclassifications compared to baseline methods.
📝 Abstract
Machine learning models are routinely used to support decisions that affect individuals -- be it to screen a patient for a serious illness or to gauge their response to treatment. In these tasks, we are limited to learning models from datasets with noisy labels. In this paper, we study the instance-level impact of learning under label noise. We introduce a notion of regret for this regime which measures the number of unforeseen mistakes due to noisy labels. We show that standard approaches to learning under label noise can return models that perform well at a population level while subjecting individuals to a lottery of mistakes. We present a versatile approach to estimate the likelihood of mistakes at the individual level from a noisy dataset by training models over plausible realizations of datasets without label noise. This is supported by a comprehensive empirical study of label noise in clinical prediction tasks. Our results reveal how failure to anticipate mistakes can compromise model reliability and adoption, and demonstrate how we can address these challenges by anticipating and avoiding regretful decisions.