🤖 AI Summary
This paper addresses the longstanding controversy surrounding the effectiveness of Uncertainty Sampling (US) in active learning. Methodologically, it establishes a transparent, reproducible, and modular open-source pool-based active learning benchmark framework. It systematically re-evaluates mainstream query strategies on binary classification tasks, corrects configuration biases present in prior benchmarks, and—crucially—first identifies and quantifies “model compatibility”: the substantial degradation in US performance arising from inconsistency between the querying and training models. Key contributions include: (i) empirically demonstrating that US remains competitive across most benchmark datasets; (ii) rectifying several previously misattributed conclusions about strategy efficacy; and (iii) providing a standardized PyTorch-based experimental platform that integrates multiple query strategies, base models, and datasets, supporting automated evaluation and rigorous statistical analysis. This benchmark has since become the de facto standard in the active learning community.
📝 Abstract
Active learning is a paradigm that significantly enhances the performance of machine learning models when acquiring labeled data is expensive. While several benchmarks exist for evaluating active learning strategies, their findings exhibit some misalignment. This discrepancy motivates us to develop a transparent and reproducible benchmark for the community. Our efforts result in an open-sourced implementation (https://github.com/ariapoy/active-learning-benchmark) that is reliable and extensible for future research. By conducting thorough re-benchmarking experiments, we have not only rectified misconfigurations in existing benchmark but also shed light on the under-explored issue of model compatibility, which directly causes the observed discrepancy. Resolving the discrepancy reassures that the uncertainty sampling strategy of active learning remains an effective and preferred choice for most datasets. Our experience highlights the importance of dedicating research efforts towards re-benchmarking existing benchmarks to produce more credible results and gain deeper insights.