🤖 AI Summary
To address the degradation of classification robustness in fine-tuning foundation models under label noise, this paper proposes a two-stage, geometry-aware reliability modeling framework that requires no retraining. First, it constructs a local neighborhood graph using the Non-Negative Kernel (NNK), mitigating the sensitivity of conventional k-NN to distance metrics and neighborhood size. Second, it introduces a noise-robust reliability estimator that performs adaptive weighted inference over the k-NN graph, effectively accommodating diverse noise patterns—including symmetric and asymmetric label noise. The method operates entirely on frozen, pre-trained embeddings without modifying model parameters. Experiments on CIFAR-10 and DermaMNIST demonstrate that our approach significantly outperforms standard k-NN and state-of-the-art adaptive neighborhood methods, achieving superior and more stable classification accuracy under various label noise settings.
📝 Abstract
Foundation models (FMs) pretrained on large datasets have become fundamental for various downstream machine learning tasks, in particular in scenarios where obtaining perfectly labeled data is prohibitively expensive. In this paper, we assume an FM has to be fine-tuned with noisy data and present a two-stage framework to ensure robust classification in the presence of label noise without model retraining. Recent work has shown that simple k-nearest neighbor (kNN) approaches using an embedding derived from an FM can achieve good performance even in the presence of severe label noise. Our work is motivated by the fact that these methods make use of local geometry. In this paper, following a similar two-stage procedure, reliability estimation followed by reliability-weighted inference, we show that improved performance can be achieved by introducing geometry information. For a given instance, our proposed inference uses a local neighborhood of training data, obtained using the non-negative kernel (NNK) neighborhood construction. We propose several methods for reliability estimation that can rely less on distance and local neighborhood as the label noise increases. Our evaluation on CIFAR-10 and DermaMNIST shows that our methods improve robustness across various noise conditions, surpassing standard K-NN approaches and recent adaptive-neighborhood baselines.