🤖 AI Summary
Open-set active learning faces two key challenges: the presence of unknown-class samples in unlabeled data and high annotation costs. To address these, we propose SAMOSA—a novel active learning algorithm that integrates Sharpness-Aware Minimization (SAM) with sample typicality modeling for the first time. SAMOSA operates on the embedding manifold to precisely identify high-information samples that lie near the decision boundary yet exhibit low typicality, thereby simultaneously enhancing discriminability for target classes and filtering out irrelevant (unknown-class) instances. Grounded in theoretical analysis of stochastic gradient descent, SAMOSA achieves improved query efficiency without introducing additional computational overhead. Extensive experiments across multiple benchmark datasets demonstrate that SAMOSA outperforms state-of-the-art methods by up to 3% in classification accuracy, while significantly improving annotation efficiency and model generalization under open-set conditions.
📝 Abstract
Modern machine learning solutions require extensive data collection where labeling remains costly. To reduce this burden, open set active learning approaches aim to select informative samples from a large pool of unlabeled data that includes irrelevant or unknown classes. In this context, we propose Sharpness Aware Minimization for Open Set Active Learning (SAMOSA) as an effective querying algorithm. Building on theoretical findings concerning the impact of data typicality on the generalization properties of traditional stochastic gradient descent (SGD) and sharpness-aware minimization (SAM), SAMOSA actively queries samples based on their typicality. SAMOSA effectively identifies atypical samples that belong to regions of the embedding manifold close to the model decision boundaries. Therefore, SAMOSA prioritizes the samples that are (i) highly informative for the targeted classes, and (ii) useful for distinguishing between targeted and unwanted classes. Extensive experiments show that SAMOSA achieves up to 3% accuracy improvement over the state of the art across several datasets, while not introducing computational overhead. The source code of our experiments is available at: https://anonymous.4open.science/r/samosa-DAF4