🤖 AI Summary
Under low-labeling-budget settings, noisy labels severely degrade active learning performance and cause sample coverage imbalance. To address this, we propose the Noise-Aware Active Sampling (NAS) framework. NAS is the first to integrate coverage-based active learning with a noise-driven resampling mechanism, incorporating a lightweight endogenous noise filtering module and a noise-sensitive region identification mechanism to enhance sampling robustness. Built upon a greedy coverage strategy, NAS prioritizes high-informativeness, low-noise samples. Experiments on CIFAR-100 and an ImageNet subset demonstrate that NAS consistently improves the performance of multiple state-of-the-art active learning methods. Notably, NAS exhibits strong robustness against both symmetric and asymmetric label noise across varying noise rates. Its design effectively mitigates the adverse impact of label corruption while preserving coverage diversity, thereby enabling reliable model training under realistic, resource-constrained, and noisy labeling scenarios.
📝 Abstract
Active Learning (AL) aims to reduce annotation costs by strategically selecting the most informative samples for labeling. However, most active learning methods struggle in the low-budget regime where only a few labeled examples are available. This issue becomes even more pronounced when annotators provide noisy labels. A common AL approach for the low- and mid-budget regimes focuses on maximizing the coverage of the labeled set across the entire dataset. We propose a novel framework called Noise-Aware Active Sampling (NAS) that extends existing greedy, coverage-based active learning strategies to handle noisy annotations. NAS identifies regions that remain uncovered due to the selection of noisy representatives and enables resampling from these areas. We introduce a simple yet effective noise filtering approach suitable for the low-budget regime, which leverages the inner mechanism of NAS and can be applied for noise filtering before model training. On multiple computer vision benchmarks, including CIFAR100 and ImageNet subsets, NAS significantly improves performance for standard active learning methods across different noise types and rates.